[linux-cifs-client] [PATCH 03/10] cifs: add replacement for cifs_strtoUCS_le called cifs_utf16le_to_host

Wed Apr 29 16:25:53 GMT 2009

On Wed, Apr 29, 2009 at 10:48 AM, Jeff Layton <jlayton at redhat.com> wrote:
> On Wed, 29 Apr 2009 11:30:23 -0400
> Jeff Layton <jlayton at redhat.com> wrote:
>
>> > > +/*
>> > > + * cifs_utf16le_to_host - convert utf16le string to local charset
>> > > + * @to - destination buffer
>> > > + * @from - source buffer
>> > > + * @tolen - destination buffer size (in bytes)
>> > > + * @fromlen - source buffer size (in bytes)
>> > > + * @codepage - codepage to which characters should be converted
>> > > + * @mapchar - should characters be remapped according to the mapchars option?
>> > > + *
>> > > + * Convert a little-endian utf16le string (as sent by the server) to a string
>> > > + * in the provided codepage. The tolen and fromlen parameters are to ensure
>> > > + * that the code doesn't walk off of the end of the buffer (which is always
>> > > + * a danger if the alignment of the source buffer is off). The destination
>> > > + * string is always properly null terminated and fits in the destination
>> > > + * buffer. Returns the length of the destination string in bytes (including
>> > > + * null terminator).
>> > > + */
>> > > +int
>> > > +cifs_utf16le_to_host(char *to, const __le16 *from, int tolen, int fromlen,
>> > > +                    const struct nls_table *codepage, bool mapchar)
>> > > +{
>> > > +       int i, charlen, safelen;
>> > > +       int outlen = 0;
>> > > +       int nullsize = null_charlen(codepage);
>> > > +       int fromwords = fromlen / 2;
>> >
>> > I think assumption here is code values are two bytes.  I think that is
>> > correct in case of UCS-2 encoding
>> > but in case of UTF-16, the code values can be either two or four bytes.
>> >
>>
>> Can you show me a citation? I thought UTF-16 meant a fixed-length 2
>> byte encoding.
We can't translate these - they will look like a ? followed by a bogus
character, but they are extremely rare, and only a few could even
generate them (e.g. some Windows, but not most applications, and not
Java).

> Given that, I'll probably rename these functions to have _ucs2le_
> instead of _utf16le_ since that's more accurate.

Let's not worry about the "le" part of the name.  It makes it a longer
name and hard to read (it sounds like you are converting "ucs" (in
host endianness) to little endian ucs, and UCS is driven by Windows
(which is little endian so the little endian part of the name is less
interesting).  Technically you could have a UCS2 big endian but I
don't think it makes sense to name it that way (a little endian vs. a
big endian in the name).

> The good news is that that does not materially change how the buffer
> sizing works here, and that's the immediate problem that we're trying
> to solve with these patches.

right - agreed.

-- 
Thanks,

Steve