i18n question.

Andrew Bartlett abartlet at samba.org
Fri Mar 5 23:53:19 GMT 2004


On Sat, 2004-03-06 at 10:28, Michael B Allen wrote:
> Benjamin Riefenstahl said:
> >I.e. you could just convert encodings on the edges of Samba and
> > keep everything in (precomposed) UTF-16 or UTF-8 on the inside.
> 
> Agreed here too. Use a primary, normalized encoding throughout and provide
> the necessary routines to convert for I/O. Otherwise, it doesn't matter
> what the performance impact is because you'll be juggling too many
> scenarios to make it work at all. 

Actually, we make it work pretty well.  The issues we are having here
are because people keep wanting to stretch the rules!

The rules where:

- Behaves like a C string (null terminated, no intermediate nulls)
- no ASCII in subsequent multibyte characters
- all multibyte characters start with the high bit set.

(plus a few others, I can't recall off the top of my head)

UTF8 is the default character set because it fits these requirements.  

It is when we break from these requirements (to support incumbent
Japanese encodings), that things get messy.  Now I would just prefer
that everybody just moved to UTF8, but back in the real world, I do
understand that Samba cannot dictate everything, and so we should
implement the required 'slow paths' for this to still correctly.

> If you're real clever about it the
> primary encoding could be configurable with a few macros and a abstracted
> set of string routines. Then when everything works well you might find a
> fast path that isn't too distruptive.

The problem is, this isn't java - so UCS2/UTF16 is out.  We have to
operate in an environment of mulitbyte 'C' strings.  We can't do a UTF16
-> UTF8 conversion every time we call stat().  That happens a *lot*...

Andrew Bartlett

-- 
Andrew Bartlett                                 abartlet at pcug.org.au
Manager, Authentication Subsystems, Samba Team  abartlet at samba.org
Student Network Administrator, Hawker College   abartlet at hawkerc.net
http://samba.org     http://build.samba.org     http://hawkerc.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.samba.org/archive/samba-technical/attachments/20040306/b987f250/attachment.bin


More information about the samba-technical mailing list