International charset in path/file names

Tue Mar 27 08:03:33 GMT 2001

>>>>> "UW" == Urban Widmark <urban at teststation.com> writes:
UW> "Asian characters" may not be supported as a code page, I don't know. It
UW> may be possible to create codepages if they are missing. code pages are
UW> enough for most east/west european non-englishness, asian non-englishness
UW> doesn't always fit into 256 bytes.

Well, we need to use both 'client code page' and 'coding system' to
define them.

For example, if you're going to use Japanese, 

    client code page = 932

is always required. Also one of :

    coding system = HEX
    coding system = EUC
    coding system = SJIS
    coding system = CAP

are required too, for we have many way to describe same character.
# I'm glad we didn't have to worry about EBCDIC(^^;)

And even though you did this, it only means we now supports
'Japanese & English'. We need to change code page in order to use
them with Chinese or Korian, etc. and if you did, you can't use
Japanese anymore. What's written on your storage will not supports
you compatibility.

Currently Samba is 'Bi-Lingual'. It's not 'Multi-Linugal'.
And this, I think, will not be fixed unless we use UCS2 for
communication between clients.
# And also, use totally unified format of some kind, for internal
# character codes, though it does not have to be Unicode.

By the way, if you're going to use samba-2.0.7, you'll find many
multi-byte bugs there. Go :

http://www.samba.gr.jp/project/samba-ja/index.html.en

and get this Samba-Japanese Edition. It fixes lots of other bugs in
2.0.7 too.
# Some are already applied to 2.2alpha as well.

Indeedly we need Samba i18n. And the frame work have to be totally
re-designed. This is because one of the major work of samba is
string treating, and very fact that we need to change this method,
means we have to change A LOT.

# I once counted the number of lines needed to be changed somehow,
# it was more than 60% of Samba code (T-T).
---- 
Kenichi Okuyama at Tokyo Research Lab. IBM-Japan, Co.