[linux-cifs-client] [PATCH 03/10] cifs: add replacement for cifs_strtoUCS_le called cifs_utf16le_to_host

Wed Apr 29 15:30:23 GMT 2009

On Wed, 29 Apr 2009 10:26:40 -0500
Shirish Pargaonkar <shirishpargaonkar at gmail.com> wrote:

> On Wed, Apr 29, 2009 at 8:29 AM, Jeff Layton <jlayton at redhat.com> wrote:
> > Add a replacement function for cifs_strtoUCS_le. cifs_utf16le_to_host
> > takes args for the source and destination length so that we can ensure
> > that the function is confined within the intended buffers.
> >
> > Signed-off-by: Jeff Layton <jlayton at redhat.com>
> > ---
> >  fs/cifs/cifs_unicode.c |  121 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/cifs/cifs_unicode.h |    2 +
> >  2 files changed, 123 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/cifs/cifs_unicode.c b/fs/cifs/cifs_unicode.c
> > index 7d75272..aafaf0d 100644
> > --- a/fs/cifs/cifs_unicode.c
> > +++ b/fs/cifs/cifs_unicode.c
> > @@ -26,6 +26,127 @@
> >  #include "cifs_debug.h"
> >
> >  /*
> > + * cifs_mapchar - convert a little-endian char to proper char in codepage
> > + * @target - where converted character should be copied
> > + * @src_char - 2 byte little-endian source character
> > + * @cp - codepage to which character should be converted
> > + * @mapchar - should character be mapped according to mapchars mount option?
> > + *
> > + * This function handles the conversion of a single character. It is the
> > + * responsibility of the caller to ensure that the target buffer is large
> > + * enough to hold the result of the conversion (at least NLS_MAX_CHARSET_SIZE).
> > + */
> > +static int
> > +cifs_mapchar(char *target, const __le16 src_char, const struct nls_table *cp,
> > +            bool mapchar)
> > +{
> > +       int len = 1;
> > +
> > +       if (!mapchar)
> > +               goto cp_convert;
> > +
> > +       /*
> > +        * BB: Cannot handle remapping UNI_SLASH until all the calls to
> > +        *     build_path_from_dentry are modified, as they use slash as
> > +        *     separator.
> > +        */
> > +       switch (le16_to_cpu(src_char)) {
> > +       case UNI_COLON:
> > +               *target = ':';
> > +               break;
> > +       case UNI_ASTERIK:
> > +               *target = '*';
> > +               break;
> > +       case UNI_QUESTION:
> > +               *target = '?';
> > +               break;
> > +       case UNI_PIPE:
> > +               *target = '|';
> > +               break;
> > +       case UNI_GRTRTHAN:
> > +               *target = '>';
> > +               break;
> > +       case UNI_LESSTHAN:
> > +               *target = '<';
> > +               break;
> > +       default:
> > +               goto cp_convert;
> > +       }
> > +
> > +out:
> > +       return len;
> > +
> > +cp_convert:
> > +       len = cp->uni2char(le16_to_cpu(src_char), target,
> > +                          NLS_MAX_CHARSET_SIZE);
> > +       if (len <= 0) {
> > +               *target = '?';
> > +               len = 1;
> > +       }
> > +       goto out;
> > +}
> > +
> > +/*
> > + * cifs_utf16le_to_host - convert utf16le string to local charset
> > + * @to - destination buffer
> > + * @from - source buffer
> > + * @tolen - destination buffer size (in bytes)
> > + * @fromlen - source buffer size (in bytes)
> > + * @codepage - codepage to which characters should be converted
> > + * @mapchar - should characters be remapped according to the mapchars option?
> > + *
> > + * Convert a little-endian utf16le string (as sent by the server) to a string
> > + * in the provided codepage. The tolen and fromlen parameters are to ensure
> > + * that the code doesn't walk off of the end of the buffer (which is always
> > + * a danger if the alignment of the source buffer is off). The destination
> > + * string is always properly null terminated and fits in the destination
> > + * buffer. Returns the length of the destination string in bytes (including
> > + * null terminator).
> > + */
> > +int
> > +cifs_utf16le_to_host(char *to, const __le16 *from, int tolen, int fromlen,
> > +                    const struct nls_table *codepage, bool mapchar)
> > +{
> > +       int i, charlen, safelen;
> > +       int outlen = 0;
> > +       int nullsize = null_charlen(codepage);
> > +       int fromwords = fromlen / 2;
> 
> I think assumption here is code values are two bytes.  I think that is
> correct in case of UCS-2 encoding
> but in case of UTF-16, the code values can be either two or four bytes.
> 

Can you show me a citation? I thought UTF-16 meant a fixed-length 2
byte encoding.

> > +       char tmp[NLS_MAX_CHARSET_SIZE];
> > +
> > +       /*
> > +        * because the chars can be of varying widths, we need to take care
> > +        * not to overflow the destination buffer when we get close to the
> > +        * end of it. Until we get to this offset, we don't need to check
> > +        * for overflow however.
> > +        */
> > +       safelen = tolen - (NLS_MAX_CHARSET_SIZE + nullsize);
> 
> Can safelen become negative?  In case of a code value byte stream
> consisting of say two, two byte code values?
> 

Yes. It doesn't matter though. The math where it's checked still works.

> > +
> > +       for (i = 0; i < fromwords && from[i]; i++) {
> > +               /*
> > +                * check to see if converting this character might make the
> > +                * conversion bleed into the null terminator
> > +                */
> > +               if (outlen >= safelen) {
> > +                       charlen = cifs_mapchar(tmp, from[i], codepage, mapchar);
> 
> If mapchar is not set, cifs_mapchar is always going to return 1 (since
> uni2char always returns 1)
> in case of no error.
> 

uni2char does not always return 1. In the case of UTF-8, for instance
it returns the width of the character in bytes that it put in the
destination buffer.

> > +                       if (charlen <= 0)
> > +                               charlen = 1;
> > +                       if ((outlen + charlen) > (tolen - nullsize))
> > +                               break;
> > +               }
> > +
> > +               /* put converted char into 'to' buffer */
> > +               charlen = cifs_mapchar(&to[outlen], from[i], codepage, mapchar);
> > +               outlen += charlen;
> > +       }
> > +
> > +       /* properly null-terminate string */
> > +       for (i = 0; i < nullsize; i++)
> > +               to[outlen++] = 0;
> > +
> > +       return outlen;
> > +}
> > +
> > +/*
> >  * NAME:       cifs_strfromUCS()
> >  *
> >  * FUNCTION:   Convert little-endian unicode string to character string
> > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h
> > index 2dfae68..e23ef08 100644
> > --- a/fs/cifs/cifs_unicode.h
> > +++ b/fs/cifs/cifs_unicode.h
> > @@ -72,6 +72,8 @@ extern struct UniCaseRange UniLowerRange[];
> >  #endif                         /* UNIUPR_NOLOWER */
> >
> >  #ifdef __KERNEL__
> > +int cifs_utf16le_to_host(char *to, const __le16 *from, int tolen, int fromlen,
> > +                        const struct nls_table *codepage, bool mapchar);
> >  int cifs_strfromUCS_le(char *, const __le16 *, int, const struct nls_table *);
> >  int cifs_strtoUCS(__le16 *, const char *, int, const struct nls_table *);
> >  #endif
> > --
> > 1.6.0.6
> >
> > _______________________________________________
> > linux-cifs-client mailing list
> > linux-cifs-client at lists.samba.org
> > https://lists.samba.org/mailman/listinfo/linux-cifs-client
> >

-- 
Jeff Layton <jlayton at redhat.com>