i18n question.

Sat Mar 6 11:29:44 GMT 2004

Monyo,

 > (3) Suggest UCS-2 as the "internal charset"
 >   The internal charset should be any of Unicode.
 >   Currently UCS-2 is better that UTF-8, because UCS-2 is a charset
 >   sent from Windows.

As I said before, UCS-2 is dead. My understanding is that Microsoft
have already switched over to sending UTF-16 on the wire. If you have
evidence that this isn't the case then please let me know.

So, with the death of UCS-2 there is no longer any fixed width charset
available that will be useful for Samba (and no, please don't tell me
to use UCS4).

 > At the view of performance, separating internal charset and set it as
 > the same as Windows (currently UCS-2) will also keep performance.

no, it won't. It would make us SLOW. Think about the system call
multiplier effect I described in my last email.

 > Currently, before/after string comparison, "unix charset" chars are
 > converted to/from UCS-2, this is expensive.

For 7 bit charsets we have already solved this with the 7-bit
accelerator. I have no problem with the idea of being able to load a
different accelerator for specific charsets (such as japanese
charsets) if that helps. What I won't do is switch all of our internal
strings to a format other than the filesystem charset. That would be a
very bad move.

 > Simply I suggest using same charset as Windows uses on the wire.
 > Currently I think Windows still uses UCS-2 on the wire, so suggest
 > UCS-2 as Samba internal charset. UTF-16 is also welcome.

Manipulating UTF-16 is just as hard as manipulating UTF-8, except that
you can use accelerators on a wider range of characters. By allowing
for local accelerators to be loaded we solve this.

 > UTF-16 is not fixed length, but as you know it's easier to handle
 > programs than UTF-8 and more and more easier than legacy Japanese
 > charsets.

Why is UTF-16 easier than UTF-8? On windows I might believe this, but
on unix systems handling UTF-16 is MUCH worse than UTF-8.

Cheers, Tridge