[Draft #2] Samba 3.0 roadmap...idmap storage & central idmap repository

Michael Sweet mike at easysw.com
Tue Jul 9 12:50:41 GMT 2002

Simo Sorce wrote:
> Hi metze,
> on top of the first doc I see you state that all strings should be utf8.
> I hearteadly disagree, I woul d rather like to see all internal strings
> on new code to be UCS-2.
> Utf8 has many disadvantages:
> 1. require any RPC requests that comes from clients to be converted
> forth and back (UCS-2->UTF8->UCS-2)

Some "conversion" will always be required, not only for byte order
issues (remember that UCS-2 strings can contain byte-order overrides)
but for normalization forms that may be required.

Also, some SMB clients are using UTF-16 now (superset of UCS-2 to
support code points in other Unicode planes) instead of UCS-2.

Finally, most UNIX filesystems only support the UTF-8 representation
of Unicode, so at some point UCS-2/UTF-16 will have to be converted
to UTF-8 anyways...

> 2. Is difficult to manipulate UTF8 strings as they are variable lenght
> multibyte chars and sometimes uppercase chars have different lenght than
> lowercase chars.
> ...

UCS-2 can have different byte orders, and with UTF-16 you also need
to keep track of the current plane as well, which makes life even
more fun.

In addition, no matter what Unicode representation is used, you
still have to deal with different representations of the "same"
character (is it a single character "a" with an umlat, or "a"
plus a combining umlat character?, etc.)

Michael Sweet, Easy Software Products                  mike at easysw.com
Printing Software for UNIX                       http://www.easysw.com

More information about the samba-technical mailing list