i18n question.

Benjamin Riefenstahl Benjamin.Riefenstahl at epost.de
Fri Mar 5 14:09:18 GMT 2004


Hi Jeremy,


Jeremy Allison <jra at samba.org> writes:
> Yep - that's what I mean. Combining characters - for people for whom
> a 32-bit character space containing Klingon and Sanskrit isn't
> enough.... :-).

You probably know most of what I am going to say here, but I thought
it might be good to re-state it in this context anyway.

To solve the problem of combining characters there is normalization.
That's just another encoding/decoding step.  It's a pain to do, but if
decomposed characters is what you have, you need to do it anyway.

Windows can not handle decomposed characters, or rather the usual
fonts can't and Windows doesn't do anything about it.  Windows also
doesn't normalize filenames for comparisons or other actions.

So the problem is actually with the unix charset on systems like Mac
OS X (decomposed UTF-8).  In those cases, you could just convert to
precomposed normalization on reading the filenames from the file
system and do the reverse on writing to the file system.

That would fit the scenario described by the previous poster just
fine.  I.e. you could just convert encodings on the edges of Samba and
keep everything in (precomposed) UTF-16 or UTF-8 on the inside.


benny



More information about the samba-technical mailing list