i18n question.

Sun Mar 7 10:18:37 GMT 2004

On Sun, 2004-03-07 at 21:09, Michael B Allen wrote:
> Richard Sharpe said:
> >> > > Some WorkStation, like Sony NeWS and several others, as well as
> >> > > Japanese specific version of popular OS, identify character with
> >> > > 0x80 bit on, as EUC or CP932.
> >> > >
> >> > > They check for validness of given string as EUC, or CP932.
> >> > > Unfortunately UTF-8 do not pass this test for most cases.
> >> > > But user still need to use these OS, and can not move to other OS.
> >> > >
> >> > > Hence, they need FS charset to be what the OS support. Not UTF-8.
> >> >
> >> > I do not understand this. Are you speaking of clients? Why would using UTF-8 internally to
> >> > Samba (the server) have any effect on the client?
> >>
> >> I think Kenichi means system libraries (something that would probably
> >> violate posix) or at the very least other applications that use the same
> >> filenames.
> >
> > My understanding of what was said just recently is that different products
> > use different code points for the same glyphs!
> 
> A "product" translates into a charset+encoding and different charsets use different code
> points for the same glyphs all the time. Combining characters and transliteration can cause
> information loss. We really need a concrete example of "troublesome characters".
> 
> > Since there is no indication of which charset is actually being used,
> > there is no way to map between the charset the client is using and UTF-8
> > (or whatever).
> 
> He might be suggesting that the string encoding used on the wire 

It is not on the wire (any more).  Wire is only unicode, it is the disk
that we have the werid encodings.  (You probably already knew that, and
this was just a thinko, but I want to be clear).

> (e.g. EUC and different
> flavors of Shift-JIS) be decoded into the common Japanese charset JIS X 0208. He's referred to
> using "UCS2" internally but he might really mean just two bytes. Otherwise UCS2 == Unicode so
> I don't see how it is different from using UTF-8.
> 
> Ultimately I get the feeling Unicode support in Japan just sucks. So natrually they don't want
> any part of it.

I would agree with that.

Andrew Bartlett

-- 
Andrew Bartlett                                 abartlet at pcug.org.au
Manager, Authentication Subsystems, Samba Team  abartlet at samba.org
Student Network Administrator, Hawker College   abartlet at hawkerc.net
http://samba.org     http://build.samba.org     http://hawkerc.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.samba.org/archive/samba-technical/attachments/20040307/8699dc15/attachment.bin