i18n question.

Sun Mar 7 04:47:51 GMT 2004

On Sun, 2004-03-07 at 14:48, Kenichi Okuyama wrote:
> Dear Andrew,
> 
> >>>>> "Andrew" == Andrew Bartlett <abartlet at samba.org> writes:
> >> So you're claiming you cannot map these to Unicode? If so, then you cannot
> >> use Windows fireservers with Unicode? Do you run entierly in a SHIFT-JIS,
> >> EUC-JP, or CP932 locale?
> Andrew> More seriously - if these character sets are not 'compatible' with
> Andrew> unicode, then the game is up.  There is no solution, as the on-wire
> Andrew> character set is UTF16.
> 
> Word "compatible" have two meaning. Bi-directional mapping is
> possible, or we can simply map CP932 to Unicode.
> 
> Also, we're using word CP932 as if it is single charset. But no.
> CP932 have dialects ( blame M$ about this fact. ).

Why blame MS?  We are interested in the charset that *unix* pathnames
are encoded in.  Or are you saying that the real issue is that existing
Samba 2.2-jp sites have used the 'pass-though' mapping for too long, and
that's the issue?

When unix tools, guis etc are used to operate on these files, what
charset do they believe they are in?

> 1) "One by One bi-directional mapping is possible":
> 	No that not possible between Unicode and CP932.

This scares me.  Samba relies on the fact that we can map two and from
unicode correctly.  If we can't get a one-to-one mapping, then we run
the risk that a file's name will appear in a directory listing, but you
will never be able to open that file by name!

> 2) "Any Dialect of CP932 can be mapped to Unicode":
> 	Yes that's possible.
>    "We can tell which Dialect we are using":
> 	No, that's problem.

Can we be told?  If the mapping table (CP932 -> Unicode) is different,
then we need a different charset module to handle it, either inside
iconv() or as a module.  

>    "Do people take risk about conversion?":
> 	Not as long as they know they can servive without it.

How is this being dealt with in other software?  My understanding is
that Gnome's pango multilingual toolkit is all UTF8 based, as are an
increasing collection of other i18n-aware applications.  Are these
applications similarly inoperable in current Japanese environments, and
what is intended to fix that?

> The story would be easy if game is up. Since People are obaying M$,
> they simply move more easily to Unicode (^o^).

This might be where we end up.

Andrew Bartlett

-- 
Andrew Bartlett                                 abartlet at pcug.org.au
Manager, Authentication Subsystems, Samba Team  abartlet at samba.org
Student Network Administrator, Hawker College   abartlet at hawkerc.net
http://samba.org     http://build.samba.org     http://hawkerc.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.samba.org/archive/samba-technical/attachments/20040307/4b5c5bb1/attachment.bin