i18n question.

Sun Mar 7 10:09:51 GMT 2004

Richard Sharpe said:
>> > > Some WorkStation, like Sony NeWS and several others, as well as
>> > > Japanese specific version of popular OS, identify character with
>> > > 0x80 bit on, as EUC or CP932.
>> > >
>> > > They check for validness of given string as EUC, or CP932.
>> > > Unfortunately UTF-8 do not pass this test for most cases.
>> > > But user still need to use these OS, and can not move to other OS.
>> > >
>> > > Hence, they need FS charset to be what the OS support. Not UTF-8.
>> >
>> > I do not understand this. Are you speaking of clients? Why would using UTF-8 internally to
>> > Samba (the server) have any effect on the client?
>>
>> I think Kenichi means system libraries (something that would probably
>> violate posix) or at the very least other applications that use the same
>> filenames.
>
> My understanding of what was said just recently is that different products
> use different code points for the same glyphs!

A "product" translates into a charset+encoding and different charsets use different code
points for the same glyphs all the time. Combining characters and transliteration can cause
information loss. We really need a concrete example of "troublesome characters".

> Since there is no indication of which charset is actually being used,
> there is no way to map between the charset the client is using and UTF-8
> (or whatever).

He might be suggesting that the string encoding used on the wire (e.g. EUC and different
flavors of Shift-JIS) be decoded into the common Japanese charset JIS X 0208. He's referred to
using "UCS2" internally but he might really mean just two bytes. Otherwise UCS2 == Unicode so
I don't see how it is different from using UTF-8.

Ultimately I get the feeling Unicode support in Japan just sucks. So natrually they don't want
any part of it.

Mike