i18n question.
Jeremy Allison
jra at samba.org
Sat Mar 6 19:12:42 GMT 2004
On Sat, Mar 06, 2004 at 09:15:47PM +1100, tridge at samba.org wrote:
> I'm also concerned that this code:
>
> if ((*s & 0x80) && IS_DIRECTORY_SEP(s[1])) {
>
> isn't general enough. It only copes with a \ as the 2nd character of a
> multi-byte char. What if \ is the 3rd character?
What character set has this property ? I wrote this after
reading the explainations of the Asian language character
sets that use \ within the mb encoding. None of them are 3
byte encodings as far as I know.
> The code also uses a UCS2 conversion, and assumes that any character
> will fit in a single UCS2 char. That isn't true once we take account
> of UTF-16.
>
> Ideally we need a function based on iconv() that tells us how many
> bytes wide the character starting at the current position in a string
> is, so we know exactly how many bytes to skip. The locale stuff like
> mbrtowc() normally does this, but as we allow loadable charsets in
> Samba we can't use those functions. The best I can think of at the
> moment is this:
>
> - if the sequence starts with a 7 bit char then return 1
> - give iconv() 2 bytes, and look at the error. If no error then its
> 2 bytes wide
> - give iconv() 3 bytes and so on ...
This is looking very similar to skip_multibyte_char() in Samba 2.2
if you remember that :-) :-). I do, 'cos I wrote it.... hmmmm.... :-).
Jeremy.
More information about the samba-technical
mailing list