UTF-8 support and other quirks in the LDAP backend (in 2.2.4).

Tue Jun 18 16:50:02 GMT 2002

On Wed, Jun 19, 2002 at 12:02:00AM +0200, Simo Sorce wrote:

> > > Yes, I think internal format (and format for tdbs) of utf8 seems
> > > like the best idea (IMHO).
> > There is a problem with utf8 for many fixed-size records in various tdbs.
> > Also, most of data is in UCS-2 already.

I don't think that's true.  Most data should be in unix character set.

> Not only that, utf-8 is not easy to manipulate as characters are not
> fixed lenght an upper case and lower case ones are not guaranted to be
> long the same amount of bytes.

Why would you need to manipulate the string on a character by character
basis?  The only case I can think of is the name mangling system.  Every
other part of Samba only cares about the total length of the string.

> So UCS-2 is more suitable for most of the manipulations, utf8 is more
> suitable to deal with unix system (file names, ecc..).
> 
> But, as windows yet speak ucs-2 with us, it is better to use that
> internally, so that conversions are kept to a minimum, and manipulation
> of data is much easier and faster.
> 
> Relegating utf8, in the long term to an internal vfs conversion for file
> name storage purposes (yes I advocate an ucs2 vfs interface for the next
> ntfs like semantic rewrite).

Yuck.  (-:

Tim.