i18n question.

TAKAHASHI Motonobu monyo at home.monyo.com
Sat Mar 6 09:05:47 GMT 2004


Michael B Allen wrote:
|There are many good tools for converting entire filesystems from one
|encoding to another. Moving foward this will be the correct solution.

|> ( I must say that Unicode do not REALLY fullfil
|>    other Japanese character encoding. It is rarely used, but
|>    most of admin do not wish to bet on their luck ).
|Why would it be a matter of "luck"?

At first we (Japanese) have lots of applications that recognize only
EUC-JP (and/or) CP932 or other legacy charsets.

The critical issue is that the there are some converting tables
between Unicode and legacy Japanese charsets like CP932, EUC-JP.
For example I know at least 6 converting tables from CP932 to Unicode.

// "Why" is very complex and needs long difficult explanation.

This means converting between Unicode and legacy Japanese charsets
may make characters broken. This issue actually bothers admins, system
builders in Japan just now :-( And this is one of main reason why
using Unicode is still rarely in Japan.

|>    Did you know that CP932 have Russian characters in it's character set?
|>
|>    We found that in some version of Windows, Russian characters have
|>    to be treated case-insensitively. We had patches for this in
|>    2.2.*, but is lost when we moved to 3.0.

// Kenichi, this feature is also included in Samba 3.0.

|What versions of Windows would that happen to be? Can you provide
|specifics on this?

Japanese character sets have their own multibyte ASCII, Cyrillic,
Roman numerals and numerals characters in historical reason and prior
version of Samba cannot handle them well.

// Also "Why" is very complex and needs long difficult explanation.

|I don't know what the situation is like in Japan but I would think
|conversion to UTF-8 would be the highest priority. If you complained that
|UTF-8 is too slow that is a valid argument. But coding for UTF-8 is not
|more or less difficult regardless of what language it is being used to
|represent.
|
|Mike

We understand the merits for UTF-8, but we have lots of legacy
resources. Actually we cannot migrate to Unicode now and probably
we need to use legacy charsets 10 years future in some parts. 

Anyway my (and probably also Kenichi, Shiro's ) opinion stands on that
Samba should fully support several unix charsets (such as CP932 and
EUC-JP) for the present.

"Suggesting UCS-2 as the internal charset" also comes from that point
of view.

-----
TAKAHASHI, Motonobu (monyo)                    monyo at home.monyo.com
                                               http://www.monyo.com/


More information about the samba-technical mailing list