[linux-cifs-client] [RFC][PATCH] cifs: add helper to simplify handling of unicode strings and use it

Sun Apr 12 18:16:02 GMT 2009

On Sun, Apr 12, 2009 at 9:23 AM, Peter Hudec <PeterHudec at web.de> wrote:
> Suresh Jayaraman schrieb:
>>
>> Based on the recent discussions on opencoded, inconsistent allocation of
>> memory needed for a unicode string conversion, the consensus is to
>> calculate the memory needed exactly instead of using size assumptions
>> and to consolidate some code that allocates memory and does conversion
>> in a helper function and use it widely.
>>

> First of all, it should be clear what means "Unicode" when we talk of
> "Unicode".
>
> There are variants with a fixed character length and a variable character
> length.
>
> When talking about Unicode for CIFS, there is meant UCS-2 or UTF-16.
> UCS-2 is fixed with 2 bytes per character.
> UTF-16 can be 2 bytes or 4 bytes long.

CIFS (and presumably NTFS, JFS and in the future SMB2 and perhaps a
few others) use Unicode (this used to be described as UCS-2, but you are
probably right that this is actually UTF-16 for certain newer servers such
as Windows 2003, Windows 2008, Vista).  Windows moved to UTF-16
internally and it wouldn't make sense for this to be specific to the network
protocol.   I don't know if in practice the distinction between UCS-2 vs. UTF-16
would make a difference in the more limited form of the mappings done here.

The WSPP has a large companion document describing Unicode which
may describe the more important details (for someone very familiar
with internationalization):

http://download.microsoft.com/download/9/5/E/95EF66AF-9026-4BB0-A41D-A4F81802D92C/%5BMS-UCODEREF%5D.pdf

-- 
Thanks,

Steve