i18n question.

Jeremy Allison jra at samba.org
Sat Mar 6 19:12:42 GMT 2004


On Sat, Mar 06, 2004 at 09:15:47PM +1100, tridge at samba.org wrote:
> I'm also concerned that this code:
> 
>   if ((*s & 0x80) && IS_DIRECTORY_SEP(s[1])) {
> 
> isn't general enough. It only copes with a \ as the 2nd character of a
> multi-byte char. What if \ is the 3rd character? 

What character set has this property ? I wrote this after
reading the explainations of the Asian language character
sets that use \ within the mb encoding. None of them are 3
byte encodings as far as I know.

> The code also uses a UCS2 conversion, and assumes that any character
> will fit in a single UCS2 char. That isn't true once we take account
> of UTF-16. 
> 
> Ideally we need a function based on iconv() that tells us how many
> bytes wide the character starting at the current position in a string
> is, so we know exactly how many bytes to skip. The locale stuff like
> mbrtowc() normally does this, but as we allow loadable charsets in
> Samba we can't use those functions. The best I can think of at the
> moment is this:
> 
>   - if the sequence starts with a 7 bit char then return 1
>   - give iconv() 2 bytes, and look at the error. If no error then its
>     2 bytes wide
>   - give iconv() 3 bytes and so on ...

This is looking very similar to skip_multibyte_char() in Samba 2.2
if you remember that :-) :-). I do, 'cos I wrote it.... hmmmm.... :-).

Jeremy.



More information about the samba-technical mailing list