i18n question.

Kenichi Okuyama okuyamak at dd.iij4u.or.jp
Wed Mar 3 01:50:58 GMT 2004


Dear Jeremy,

# Have you set 'Reply-To:'?


>>>>> "JA" == Jeremy Allison <jra at samba.org> writes:
>> He covered CP932 (Japanese), UHC (Korean), GB18030 (Simplified
>> Chinese) and Big5 (Traditional Chinese).  He says, that "/" is not
>> used by any of these, but "\" is used as a trail byte in CP932,
>> GB18030 and Big5.
JA> Oh well, these character sets are going to be quite slow then :-(.
JA> I'll add the code without the special case for the broken char
JA> sets and then fix it up afterwards.

Easiest way to solve that problem is to use UCS2, or UTF8
as internal character coding.

As soon as you recieve path name from SMB request, convert them to
UCS2. Do all the '/'<->'\' conversions and other stuffs as UCS2 and
converting that character code to local (unix) code right before IO.
# or UTF-8 instead of UCS2, I mean.


In case of UCS2, you have to treat 16bits per character, and
(unfortunately) UCS2 still is "MULTI WORD" character set, but at
least, we do not need to worry about multi-word-ness for L'\' nor
L'/'. So, path name converter do not need to worry about them.

In case of UTF-8, well they are multi-byte coding. But, no character
will have '\' nor '/' as part of multi-byte character part. So we
can handle them just as if it is UTF-8.
# Except for some 'full-size' characters we have as CP932 or Big5.


I do not know this idea will meet your requirement, but after all, I
believe this is THE MOST EASIEST WAY to solve multi-lingual problem,
especially in case for SMB.
---- 
Kenichi Okuyama



More information about the samba-technical mailing list