i18n question.

Kenichi Okuyama okuyamak at dd.iij4u.or.jp
Sat Mar 6 02:52:27 GMT 2004


Dear Michael,

>>>>> "Michael" == Andrew Bartlett <abartlet at samba.org> writes:
Michael> The problem is, this isn't java - so UCS2/UTF16 is out.  We have to
Michael> operate in an environment of mulitbyte 'C' strings.  We can't do a UTF16
Michael> -> UTF8 conversion every time we call stat().  That happens a *lot*...

I'd like to point one thing, then ask questions.

Pointout: UTF16 is not UCS2. What we really need is not UTF16->UTF8,
          but is UCS2->UTF8 ( and vice versa, ofcourse ).

Questions:
Q1) Doesn't that just means we need conversion cache?
    Conversion between UTF8<->UCS2 will not take time if we know
    what to use. I thought in old 2.2.8 or somewhere, we used to
    have this conversion cache table which worked quite fast.

    We do call stat() many times, but we call stat() against "same
    string" many times.

Q2) I don't see what you mean by "skip UCS2 because this isn't
    java".
    UCS2 is, for Windows, 16bit ushort per word, 1 word per
    character encoding. We do not need to worry about Multi-Byte
    ( which measn you will not know where is THE NEXT character
      until you really scan the string ).
    Once any string is converted to UCS2, we can treat them just
    like ascii, except that we do need to care for 16bit length.

Q3) Wasn't UCS2 part of 'C' string from ANSI-C?
    Or are you saying " 'C' string " in meaning of old K&R ?

regards,
---- 
Kenichi Okuyama



More information about the samba-technical mailing list