i18n question.

tridge at samba.org tridge at samba.org
Sat Mar 6 10:15:47 GMT 2004


Jeremy,

 > > On the other hand, code that assumes that you can search for '/' or
 > > '\' in a string and assume that is the start of a character is more of
 > > a problem. The solution is definately not to switch away from 
 > > "char *", instead we need to define a clean function that does
 > > directory parsing correctly and use this function everywhere its
 > > needed.
 > 
 > I just committed this function :-).

You mean check_path_syntax() ? That isn't the sort of function I
mean. 

I mean a function that splits a path in windows form (ie. with \
separator) into two parts, both in unix form. The first part will be a
directory and second will be a filename in that directory. Both parts
will be allocated (possibly with talloc) and the input string will be
unchanged (ie. none of that pstring crap!).

check_path_syntax() is fine in many places in Samba, but still means
that lots of places (eg. ACL code) will have to re-parse the full path
into a separate directory and filename.

I'm also concerned that this code:

  if ((*s & 0x80) && IS_DIRECTORY_SEP(s[1])) {

isn't general enough. It only copes with a \ as the 2nd character of a
multi-byte char. What if \ is the 3rd character? 

The code also uses a UCS2 conversion, and assumes that any character
will fit in a single UCS2 char. That isn't true once we take account
of UTF-16. 

Ideally we need a function based on iconv() that tells us how many
bytes wide the character starting at the current position in a string
is, so we know exactly how many bytes to skip. The locale stuff like
mbrtowc() normally does this, but as we allow loadable charsets in
Samba we can't use those functions. The best I can think of at the
moment is this:

  - if the sequence starts with a 7 bit char then return 1
  - give iconv() 2 bytes, and look at the error. If no error then its
    2 bytes wide
  - give iconv() 3 bytes and so on ...

With that function we can make a sane path parsing function that
should work for all charsets that we have a hope of supporting.

Cheers, Tridge


More information about the samba-technical mailing list