i18n question.

Benjamin Riefenstahl Benjamin.Riefenstahl at epost.de
Wed Mar 10 10:59:37 GMT 2004

Hi Peter,

Peter Waechtler <peter at helios.de> writes:
> Out of curiosity: can you tell one advantage of using decomposed
> UTF8 on MacOSX? I can't, and if it has none.. just complicates the
> things because you have to 'look backwards' to get the multibyte
> sequence.

I only said:
>> This is a fixed, non-changeable constant (which otherwise is a good
>> thing IMO).

So my personal comment was about having a (any) capable, guaranteed
encoding, instead of having to be encoding agnostic or not knowing the
actual encoding at all and having to bother users to configure this
arcane (for users) information.

As for de-composed Unicode in Mac OS X, it's questionable if it was a
good idea of Apple to be the first to actually introduce it, it's just
incompatible with the rest of the world.  And than they were not even
consistent about it, some other parts of Mac OS X use pre-composed

> just complicates the things because you have to 'look backwards' to
> get the multibyte sequence.

There are other combining sequences that can not be represented
pre-composed.  Remember that pre-composed characters are only
allocated in Unicode as far as pre-exisiting encodings already have
them.  Unicode supports any combination, even those that were left out
of previous encodings.  Also a number of scripts and script variants
are encoded in Unicode for the first time, so there are no
pre-composed characters for these and never will be.

If you want to support this than you need to be able to handle the
situation anyway, whatever problems you may have with it internally.
Multi-character sequences that combine to a single glyph are just a
fact in a universal system, whether there are some selected
pre-combined characters or not.

Disallowing pre-composed characters is than done, because it
eliminates alternatives that should not be visible to the user anyway.
You have the choice of de-composing (or pre-composing) for every
filename comparison you do, so that you can compare, or you can
translate on the edges of your system, so that at least your internal
representation can ignore the problem.


More information about the samba-technical mailing list