[linux-cifs-client] Re: [RFC][PATCH] cifs: add helper to simplify handling of unicode strings and use it

Jeff Layton jlayton at redhat.com
Sun Apr 12 11:39:55 GMT 2009


On Sun, 12 Apr 2009 14:43:57 +0530
Suresh Jayaraman <sjayaraman at suse.de> wrote:

> Based on the recent discussions on opencoded, inconsistent allocation of
> memory needed for a unicode string conversion, the consensus is to
> calculate the memory needed exactly instead of using size assumptions
> and to consolidate some code that allocates memory and does conversion
> in a helper function and use it widely.
> 
> This patch attempts to do the same. Please note that non-unicode cases
> has not been converted now as they are less error-prone than unicode
> cases. I've only compile tested this patch and yet to test it
> completely, but posting this little premature patch as there are lot
> of discussions happening around this code. This patch requires careful
> review.
> 
> Thanks to Jeff, Steve and Simo for useful suggestions.
> 
> Signed-off-by: Suresh Jayaraman <sjayaraman at suse.de>
> ---
> 
>  fs/cifs/cifs_unicode.h |   22 ++++++++++++++++++++
>  fs/cifs/cifsproto.h    |    2 +
>  fs/cifs/cifssmb.c      |   38 +++++++++++++++++++++++++++++++++++
>  fs/cifs/connect.c      |   51 +++++++++++++++--------------------------------
>  fs/cifs/sess.c         |   36 +++++++++++----------------------
>  5 files changed, 90 insertions(+), 59 deletions(-)
> 
> diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h
> index 14eb9a2..43c0438 100644
> --- a/fs/cifs/cifs_unicode.h
> +++ b/fs/cifs/cifs_unicode.h
> @@ -34,6 +34,7 @@
>  #include <asm/byteorder.h>
>  #include <linux/types.h>
>  #include <linux/nls.h>
> +#include "cifspdu.h"
>  
>  #define  UNIUPR_NOLOWER		/* Example to not expand lower case tables */
>  
> @@ -63,6 +64,7 @@ int cifs_strfromUCS_le(char *, const __le16 *, int, const struct nls_table *);
>  int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *);
>  #endif
>  
> +#define UNISTR_MAX_SIZE	(MAX_PATHCONF * (NLS_MAX_CHARSET_SIZE + 1))
>  /*
>   * UniStrcat:  Concatenate the second string to the first
>   *
> @@ -159,6 +161,26 @@ UniStrnlen(const wchar_t *ucs1, int maxlen)
>  }
>  
>  /*
> + * UniStrnlenBytes: Return the length in bytes
> + */
> +static inline int
> +UniStrnlenBytes(const char *str, int maxlen, const struct nls_table *codepage)
> +{
> +	int rc;
> +	wchar_t uni;
> +	size_t nbytes = 0;
> +	char buf[UNISTR_MAX_SIZE];
> +
> +	while (*str++) {
> +		rc = codepage->char2uni(str, maxlen, &uni);

Shouldn't this be uni2char? We're converting from unicode to the local
charset, correct? That's what cifs_strfromUCS_le is using so it makes
sense that we should be doing the same to estimate the memory that it
will use.

> +		if (rc == -1)
> +			return -EINVAL;

cifs_strfromUCS_le() doesn't return an error if the char can't be
converted. It just turns it into a '?' and moves on. We probably want
to do the same here.

> +		nbytes += utf8_wcstombs(buf, &uni, maxlen);

I don't think this is needed. I think uni2char function returns the
length of the character in bytes. You may be able to just tally
up the return values from uni2char.

Also, you can't assume that 'uni' will be a NULL terminated string.
It's just a wchar_t on the stack. Doing the above may make that
function walk off the end of the stack.

> +	}
> +	return nbytes;
> +}
> +
> +/*
>   * UniStrncat:  Concatenate length limited string
>   */
>  static inline wchar_t *
> diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h
> index 4167716..d154ed1 100644
> --- a/fs/cifs/cifsproto.h
> +++ b/fs/cifs/cifsproto.h
> @@ -383,4 +383,6 @@ extern int CIFSSMBSetPosixACL(const int xid, struct cifsTconInfo *tcon,
>  		const struct nls_table *nls_codepage, int remap_special_chars);
>  extern int CIFSGetExtAttr(const int xid, struct cifsTconInfo *tcon,
>  			const int netfid, __u64 *pExtAttrBits, __u64 *pMask);
> +extern int cifs_nls_to_str(char **dst, const char *src, const int maxlen,
> +			   int *plen, const struct nls_table *nls_codepage);
>  #endif			/* _CIFSPROTO_H */
> diff --git a/fs/cifs/cifssmb.c b/fs/cifs/cifssmb.c
> index bc09c99..963f4f9 100644
> --- a/fs/cifs/cifssmb.c
> +++ b/fs/cifs/cifssmb.c
> @@ -81,6 +81,44 @@ static struct {
>  #endif /* CONFIG_CIFS_WEAK_PW_HASH */
>  #endif /* CIFS_POSIX */
>  
> +
> +/*
> + * Calculates, allocates memory for nls and converts it to character string.
> + * Note: caller is responsible for freeing dst if function returned 0.
> + * returns:
> + * 	on success - 0
> + * 	on failure - errno
> + */
> +int
> +cifs_nls_to_str(char **dst, const char *src, const int maxlen, int *plen,
> +		       const struct nls_table *nls_codepage)
> +{
> +	size_t nbytes;
> +
> +	*plen = UniStrnlen((wchar_t *)src, maxlen);
> +	nbytes = UniStrnlenBytes(src, maxlen, nls_codepage);

It would be nice not to have to walk the string twice like this. Maybe
we need a combined function that returns both the length in bytes and
in characters and only walks the string once?

I'm not too worried about performance with the current users, but we
need to consider it in case we want to use these routines in the future
for more performance-sensitive purposes (for instance, maybe with readdir?)

> +
> +	/* We look for obvious messed up bcc or strings in response so we do
> +	 * not go off the end since (at least) WIN2K and * Windows XP have a
> +	 * major bug in not null terminating last Unicode string in response
> +	 */
> +	kfree(*dst);

Are you sure this is going to be a valid or null pointer that gets
passed in here? Even if that's the case now, it seems reasonable that
at some point in the future someone could just declare a pointer on the
stack and pass it into this function uninitialized. That could be bad.

I'd recommend removing this line and expect the caller to free memory
at this pointer beforehand if it's needed. If you don't want to do
that, then I think you need to make it clear in the comments that dst
needs to either be a pointer to freeable memory or NULL.

> +
> +	*dst = kzalloc(nbytes + 2, GFP_KERNEL);
> +	if (!*dst)
> +		goto err_exit;
> +
> +	cifs_strfromUCS_le(*dst, (__le16 *)src, *plen, nls_codepage);
> +	(*dst)[*plen] = 0;
> +	(*dst)[*plen + 1] = 0; /* harmless for ASCII case, needed for Unicode */
> +
> +	return 0;
> +
> +err_exit:
> +	cERROR(1, ("Failed to allocate buffer for string\n"));
> +	return -ENOMEM;
> +}
> +
>  /* Allocates buffer into dst and copies smb string from src to it.
>   * caller is responsible for freeing dst if function returned 0.
>   * returns:
> diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
> index 0de3b56..1880537 100644
> --- a/fs/cifs/connect.c
> +++ b/fs/cifs/connect.c
> @@ -2667,36 +2667,23 @@ CIFSSessSetup(unsigned int xid, struct cifsSesInfo *ses,
>  					remaining_words =
>  						BCC(smb_buffer_response) / 2;
>  				}
> -				len =
> -				    UniStrnlen((wchar_t *) bcc_ptr,
> -					       remaining_words - 1);
> -/* We look for obvious messed up bcc or strings in response so we do not go off
> -   the end since (at least) WIN2K and Windows XP have a major bug in not null
> -   terminating last Unicode string in response  */
> -				if (ses->serverOS)
> -					kfree(ses->serverOS);
> -				ses->serverOS = kzalloc(2 * (len + 1),
> -							GFP_KERNEL);
> -				if (ses->serverOS == NULL)
> +				rc = cifs_nls_to_str(&(ses->serverOS),
> +						bcc_ptr, remaining_words - 1,
> +						&len, nls_codepage);
> +				if (rc)
>  					goto sesssetup_nomem;
> -				cifs_strfromUCS_le(ses->serverOS,
> -						   (__le16 *)bcc_ptr,
> -						   len, nls_codepage);
>  				bcc_ptr += 2 * (len + 1);
>  				remaining_words -= len + 1;
>  				ses->serverOS[2 * len] = 0;
>  				ses->serverOS[1 + (2 * len)] = 0;
>  				if (remaining_words > 0) {
> -					len = UniStrnlen((wchar_t *)bcc_ptr,
> -							 remaining_words-1);
> -					kfree(ses->serverNOS);
> -					ses->serverNOS = kzalloc(2 * (len + 1),
> -								 GFP_KERNEL);
> -					if (ses->serverNOS == NULL)
> +					rc = cifs_nls_to_str(
> +							&(ses->serverNOS),
> +							bcc_ptr,
> +							remaining_words - 1,
> +							&len, nls_codepage);
> +					if (rc)
>  						goto sesssetup_nomem;
> -					cifs_strfromUCS_le(ses->serverNOS,
> -							   (__le16 *)bcc_ptr,
> -							   len, nls_codepage);
>  					bcc_ptr += 2 * (len + 1);
>  					ses->serverNOS[2 * len] = 0;
>  					ses->serverNOS[1 + (2 * len)] = 0;
> @@ -2707,19 +2694,13 @@ CIFSSessSetup(unsigned int xid, struct cifsSesInfo *ses,
>  					}
>  					remaining_words -= len + 1;
>  					if (remaining_words > 0) {
> -						len = UniStrnlen((wchar_t *) bcc_ptr, remaining_words);
> -				/* last string is not always null terminated
> -				   (for e.g. for Windows XP & 2000) */
> -						if (ses->serverDomain)
> -							kfree(ses->serverDomain);
> -						ses->serverDomain =
> -						    kzalloc(2*(len+1),
> -							    GFP_KERNEL);
> -						if (ses->serverDomain == NULL)
> +						rc = cifs_nls_to_str(
> +							&(ses->serverDomain),
> +							bcc_ptr,
> +							remaining_words, &len,
> +							nls_codepage);
> +						if (rc)
>  							goto sesssetup_nomem;
> -						cifs_strfromUCS_le(ses->serverDomain,
> -							(__le16 *)bcc_ptr,
> -							len, nls_codepage);
>  						bcc_ptr += 2 * (len + 1);
>  						ses->serverDomain[2*len] = 0;
>  						ses->serverDomain[1+(2*len)] = 0;
> diff --git a/fs/cifs/sess.c b/fs/cifs/sess.c
> index 5c68b42..1e82b22 100644
> --- a/fs/cifs/sess.c
> +++ b/fs/cifs/sess.c
> @@ -301,33 +301,26 @@ static int decode_unicode_ssetup(char **pbcc_area, int bleft,
>  	words_left = bleft / 2;
>  
>  	/* save off server operating system */
> -	len = UniStrnlen((wchar_t *) data, words_left);
> +	rc = cifs_nls_to_str(&(ses->serverOS), data, words_left, &len,
> +			nls_cp);
> +	if (rc)
> +		return rc;
>  
> -/* We look for obvious messed up bcc or strings in response so we do not go off
> -   the end since (at least) WIN2K and Windows XP have a major bug in not null
> -   terminating last Unicode string in response  */
>  	if (len >= words_left)
>  		return rc;
>  
> -	kfree(ses->serverOS);
> -	/* UTF-8 string will not grow more than four times as big as UCS-16 */
> -	ses->serverOS = kzalloc((4 * len) + 2 /* trailing null */, GFP_KERNEL);
> -	if (ses->serverOS != NULL)
> -		cifs_strfromUCS_le(ses->serverOS, (__le16 *)data, len, nls_cp);
>  	data += 2 * (len + 1);
>  	words_left -= len + 1;
>  
>  	/* save off server network operating system */
> -	len = UniStrnlen((wchar_t *) data, words_left);
> +	rc = cifs_nls_to_str(&(ses->serverNOS), data, words_left, &len,
> +				nls_cp);
> +	if (rc)
> +		return rc;
>  
>  	if (len >= words_left)
>  		return rc;
> -
> -	kfree(ses->serverNOS);
> -	ses->serverNOS = kzalloc((4 * len) + 2 /* trailing null */, GFP_KERNEL);
>  	if (ses->serverNOS != NULL) {
> -		cifs_strfromUCS_le(ses->serverNOS, (__le16 *)data, len,
> -				   nls_cp);
>  		if (strncmp(ses->serverNOS, "NT LAN Manager 4", 16) == 0) {
>  			cFYI(1, ("NT4 server"));
>  			ses->flags |= CIFS_SES_NT4;
> @@ -337,19 +330,14 @@ static int decode_unicode_ssetup(char **pbcc_area, int bleft,
>  	words_left -= len + 1;
>  
>  	/* save off server domain */
> -	len = UniStrnlen((wchar_t *) data, words_left);
> +	rc = cifs_nls_to_str(&(ses->serverDomain), data, words_left,
> +				    &len, nls_cp);
> +	if (rc)
> +		return rc;
>  
>  	if (len > words_left)
>  		return rc;
>  
> -	kfree(ses->serverDomain);
> -	ses->serverDomain = kzalloc(2 * (len + 1), GFP_KERNEL); /* BB FIXME wrong length */
> -	if (ses->serverDomain != NULL) {
> -		cifs_strfromUCS_le(ses->serverDomain, (__le16 *)data, len,
> -				   nls_cp);
> -		ses->serverDomain[2*len] = 0;
> -		ses->serverDomain[(2*len) + 1] = 0;
> -	}
>  	data += 2 * (len + 1);
>  	words_left -= len + 1;
>  


-- 
Jeff Layton <jlayton at redhat.com>


More information about the linux-cifs-client mailing list