i18n question.

Sun Mar 7 17:12:28 GMT 2004

Michael B Allen wrote:
>I would like to know more about these "trouble-some characters". Can you
>provide a link in english that describes in detail a case where converting
>to Unicode fails do to inadequte charset or character encoding support?
>Please provide a byte sequence in CP932 that does not map to Unicode.

Kenichi Okuyama wrote:
|Read CJKV... That's almost only information source I can give you in
|English ( yes English is the biggest barrier you're giving me ).

FYI. these URLs will probably help you:

http://www.debian.or.jp/~kubota/unicode-symbols-map2.html.en
http://www.miraclelinux.com/english/technet/samba30/index.html
http://www.miraclelinux.com/english/technet/samba30/iconv_issues.html

And in these URLs, the answers of abartlet's are also included.

>> 1) "One by One bi-directional mapping is possible":
>> 	No that not possible between Unicode and CP932.
>
>This scares me.

This is true.
Please refer to 

http://www.debian.or.jp/~kubota/unicode-symbols-map2.html.en

>> 2) "Any Dialect of CP932 can be mapped to Unicode":
>> 	Yes that's possible.
>>    "We can tell which Dialect we are using":
>> 	No, that's problem.
>
>Can we be told?  If the mapping table (CP932 -> Unicode) is different,
>then we need a different charset module to handle it, either inside
>iconv() or as a module.  

**For Samba** I think this is not a problem in most case.

Windows uses its own mapping table (CP932 <-> Unicode), So what we
need is that Samba can support the same mapping table as Windows uses.

Unfortunately Samba 3.0 needs iconv() to support the conversion and
most of iconv()s are not fully compatible with Windows mapping table.

As told in 

http://www.miraclelinux.com/english/technet/samba30/index.html

Only glibc 2.3.3 or later and patched libiconv and glibc are currently
compatible with Windows mapping table.

// "Why" is very complex and historical reason.

And ms932 locale on Solaris will be also compatible with Windows but
cannot be use in Samba because ms932 can be converted from/to UTF-8
only.

>>    "Do people take risk about conversion?":
>> 	Not as long as they know they can servive without it.
>
>How is this being dealt with in other software?

As you said, UTF-8 (Unicode) ready applications are increasing. 
If you had not thought of legacy charsets, they would work well.

But

|Are these applications similarly inoperable in current Japanese
|environments

Unfortunately yes.
This is just why migrating to UTF-8 (Unicode) or use UTF-8
applications and legacy charsets applications at the same time is not
easy at the technical view.

|and what is intended to fix that?

Now we usually choice the Windows mapping table and modify/add the
extra conversion modules for applications that do not use Windows
mapping table because compatibility with Windows is neccessary.

But Windows mapping table is not different from the Unicode
consosium's reference.

Probably there are no good idea how to fix that essantially.

-----
TAKAHASHI, Motonobu (monyo)                    monyo at home.monyo.com
                                               http://www.monyo.com/