monyo at home.monyo.com
Tue Mar 9 19:03:34 GMT 2004
// Sorry, I'm now bogged down into my job so have few time to reply...
Michael B Allen wrote:
>> FYI. these URLs will probably help you:
>This was very informative. So even though JIS X 0208 is the charset used for EUC-JP and
>Shift-JIS, using the Unicode mapping tables for either EUC-JP or Shift-JIS can result in
>different Unicode values.
Yes, you surely understand the problem.
|So U+FFE0 will be received on the wire, U+00A2 is read from disk resulting in a mismatch. Is
|this the sort of problem youre describing?
Yes, that is the "conversion problems" I called.
|However, provided the CP932 mappings were "corrected" (e.g. using libiconv-1.9.1 pluglin) then
|the utility would correctly convert ¢ to U+FFE0. Then you have Unicode on the wire
|(UCS-2LE/UTF-16LE), Unicode internally (UTF-8), and Unicode on Disk (UTF-8). So at least in
|this case there would be no mapping problems. It's only when you want to use a legacy encoding
|on disk for filenames that the internal string handling results in problems (e.g. 2nd byte
|ASCII violates one of The Sanity Rules).
Yes, so if we want to use Samba 3.0 with Japanese, first we should fix
(modify?) the conversion table of iconv(). Unfortunately only glibc
2.3.3 or later, or patched GNU libiconv are satisfied with Samba 3.0.
This means currently I have to say "Samba 3.0 cannot work correctly in
Japanese environment for business use".
And another problem is that U+FFF0 exists in "Restricted area" where
Unicode consosium does not recommend use of this area. As long
as we keep compatibility, we could not waste "restricted area", that
means, I think, that we do not completely migrate to Unicode world...
TAKAHASHI, Motonobu (monyo) monyo at home.monyo.com
More information about the samba-technical