i18n question.
Kenichi Okuyama
okuyamak at dd.iij4u.or.jp
Sat Mar 6 02:52:27 GMT 2004
Dear Michael,
>>>>> "Michael" == Andrew Bartlett <abartlet at samba.org> writes:
Michael> The problem is, this isn't java - so UCS2/UTF16 is out. We have to
Michael> operate in an environment of mulitbyte 'C' strings. We can't do a UTF16
Michael> -> UTF8 conversion every time we call stat(). That happens a *lot*...
I'd like to point one thing, then ask questions.
Pointout: UTF16 is not UCS2. What we really need is not UTF16->UTF8,
but is UCS2->UTF8 ( and vice versa, ofcourse ).
Questions:
Q1) Doesn't that just means we need conversion cache?
Conversion between UTF8<->UCS2 will not take time if we know
what to use. I thought in old 2.2.8 or somewhere, we used to
have this conversion cache table which worked quite fast.
We do call stat() many times, but we call stat() against "same
string" many times.
Q2) I don't see what you mean by "skip UCS2 because this isn't
java".
UCS2 is, for Windows, 16bit ushort per word, 1 word per
character encoding. We do not need to worry about Multi-Byte
( which measn you will not know where is THE NEXT character
until you really scan the string ).
Once any string is converted to UCS2, we can treat them just
like ascii, except that we do need to care for 16bit length.
Q3) Wasn't UCS2 part of 'C' string from ANSI-C?
Or are you saying " 'C' string " in meaning of old K&R ?
regards,
----
Kenichi Okuyama
More information about the samba-technical
mailing list