[Draft #2] Samba 3.0 roadmap...idmap storage & centralidmaprepository

TAKAHASHI Motonobu monyo at samba.gr.jp
Wed Jul 10 10:51:30 GMT 2002

>>>> >Hi metze,
>>>> >on top of the first doc I see you state that all strings should
>>>> >be utf8.
>>>>I hearteadly disagree, I would rather like to see all internal
>>>>strings on new code to be UCS-2.
>>>>Utf8 has many disadvantages:
>>>>1. require any RPC requests that comes from clients to be
>>>>converted forth and back (UCS-2->UTF8->UCS-2)
>>>>2. Is difficult to manipulate UTF8 strings as they are variable
>>>>lenght multibyte chars and sometimes uppercase chars have
>>>>different lenght than lowercase chars.
>>>With UCS-2 the usage of DEBUG() and other string functions might
>>>be a lot more difficult than with UTF8 as it would require to
>>>use smb_ucs2_t instead of char*.
>> It is not really a problem, we only have to build up a DEBUG function
>> that converts to ascii before printing (and we should do the same with
>> utf8 too afaik), debug statement performance is not so important imho.
>I'm a little worried on that one - we have a lot of debug statements.

UCS-2 is strongly recommended.

In addition to these 2 reasons, 
3. using UTF-8 hides problems which will occur only under "multibyte"
  chars strings such as Japanese one.
4. Using variable length charset like UTF-8, we should carefully  
  distinguish between "the number of character for a string" and "the
  number of bytes for a string".

Indeed on current Samba code, there are lots of problems which occur
only under multibyte chars strings, not under ASCII chars only, and
which come from coufusing the length.

Using variable length charset causes lots of bugs to manipulate
multibyte chars, I think.

TAKAHASHI, Motonobu(monyo)         monyo at samba.gr.jp

More information about the samba-technical mailing list