tridge at samba.org
tridge at samba.org
Sat Mar 6 11:29:44 GMT 2004
> (3) Suggest UCS-2 as the "internal charset"
> The internal charset should be any of Unicode.
> Currently UCS-2 is better that UTF-8, because UCS-2 is a charset
> sent from Windows.
As I said before, UCS-2 is dead. My understanding is that Microsoft
have already switched over to sending UTF-16 on the wire. If you have
evidence that this isn't the case then please let me know.
So, with the death of UCS-2 there is no longer any fixed width charset
available that will be useful for Samba (and no, please don't tell me
to use UCS4).
> At the view of performance, separating internal charset and set it as
> the same as Windows (currently UCS-2) will also keep performance.
no, it won't. It would make us SLOW. Think about the system call
multiplier effect I described in my last email.
> Currently, before/after string comparison, "unix charset" chars are
> converted to/from UCS-2, this is expensive.
For 7 bit charsets we have already solved this with the 7-bit
accelerator. I have no problem with the idea of being able to load a
different accelerator for specific charsets (such as japanese
charsets) if that helps. What I won't do is switch all of our internal
strings to a format other than the filesystem charset. That would be a
very bad move.
> Simply I suggest using same charset as Windows uses on the wire.
> Currently I think Windows still uses UCS-2 on the wire, so suggest
> UCS-2 as Samba internal charset. UTF-16 is also welcome.
Manipulating UTF-16 is just as hard as manipulating UTF-8, except that
you can use accelerators on a wider range of characters. By allowing
for local accelerators to be loaded we solve this.
> UTF-16 is not fixed length, but as you know it's easier to handle
> programs than UTF-8 and more and more easier than legacy Japanese
Why is UTF-16 easier than UTF-8? On windows I might believe this, but
on unix systems handling UTF-16 is MUCH worse than UTF-8.
More information about the samba-technical