Michael B Allen
mba2000 at ioplex.com
Sat Mar 6 01:21:11 GMT 2004
Andrew Bartlett said:
>> If you're real clever about it the
>> primary encoding could be configurable with a few macros and a
>> set of string routines. Then when everything works well you might find a
>> fast path that isn't too distruptive.
> The problem is, this isn't java - so UCS2/UTF16 is out. We have to
> operate in an environment of mulitbyte 'C' strings. We can't do a UTF16
> -> UTF8 conversion every time we call stat(). That happens a *lot*...
Agreed, using the filesystem encoding wherever possible is going to be
My point is just that parameterizing things a little might get you a lot
farther with respect to a wider range of encodings/charsets. If the string
operations (i.e. copying, computing size and domain specific stuff like
conanicalization and common path manipulations) can be abstracted in a
reasonable way, you could link against the appropriate routines for the
chosen encoding (e.g. EUC-JP).
Otherwise, what you're doing right now sounds ok too as it really is a
superset of the normalized encoding technique that I advocated previously.
That is provided it truely delivers the effect of using a normalized
encoding which would be that fewer cases to consider encourages smaller
more correct code.
Just out of curiosity, is the Japanese crowd really not satisfied with
UTF-8? Is it too slow?
More information about the samba-technical