unicode strings

Jeremy Allison jallison at whistle.com
Thu Mar 12 18:10:57 GMT 1998


Jean-Francois Micouleau wrote:
> 
> All the dce/rpc stuff is using unicode strings of different types by now.
> What I intended is even if samba don't use unicode everywhere, we (or you
> ?) have to define a standard unicode representation internally to samba,
> so we can have a standard layer to work from, instead of a lot of
> representation.
> 

Well the problem with using UNICODE internally is
doing the case insensitivity correctly with different
code pages that clients use.

Currently we load only one case insensitivity table
depending on what client code page was selected by 
the admin.

For UNICODE, we need to have an upper->lower and
lower->upper map for 65536 characters - this comes
out to 256k of data.

Now we could end up mmap'ing that into our address
space for lookups, but it'll be painful on systems
that don't have shared memory (all smbd's need to
read that table on startup).

Our current char->unicode conversion is just
plain wrong, for non ASCII (multibyte) code pages.

We still need to have a default code page so we
can deal with clients that send non-UNICODE 
SMB requests, so we have a clue how to map
them into the correct area of the UNICODE
character space.

And then there's the issue of storing the
UNICODE filenames on disk...... (a UNICODE
to multibyte conversion). All this is such
fun that I'm trying to put off the internal
conversion of Samba to UNICODE for as long as
possible with the dynamic codepage support :-).

Jeremy.



-- 
--------------------------------------------------------
Buying an operating system without source is like buying
a self-assembly Space Shuttle with no instructions.
--------------------------------------------------------


More information about the samba-technical mailing list