CH_DISPLAY and gettext
TAKAHASHI Motonobu
monyo at monyo.com
Thu Jun 23 11:16:01 MDT 2011
From: Michael Adam <obnox at samba.org>
Date: Thu, 23 Jun 2011 15:04:27 +0200
> I have some points of criticism with CH_UNIX used as charset to
> internally store strings (file names, user names, etc) in memory
> as well as in databases. I am sure that there have been very good
> reasons for introducing CH_UNIX as internal encoding in the past,
> but I am questioning this anyways:
>
> 1) This yields information too early!
> The mapping Unicode --> CH_UNIX is potentially lossy.
> E.g. if I use ASCII or some latin/iso charset, then some characters
> will not be displayable. Maybe even unmarshalling will fail
> so users will not be available, depending on the value of CH_UNIX.
>
> 2) Storing our internal databases (s3 eg: group mapping, passdb)
> in CH_UNIX is a very bad thing: This encoding might be changed
> by the administrators and the databases are not coverted
> automatically. Neither is the file system but there is convmv
> for this. But for the internal DBs there is not even a
> conversion tools. I have to look which other databases are
> stored in which encoding, especially samba4.
>
> I have been in quite cumbersome manual db repair due to this
> problem more than once already. This was really bad!
>
> In order to fix #2, there are two options:
>
> a) Change the dbs (individually) to convert from internal
> representation to UTF8 (or UTF16 maybe), before storing.
>
> b) change samba to internally store everyhting in UTF8
> and then write out the DBs unchanged.
> For every target that needs a special encoding (like
> the file system needing CH_UNIX), we'd then need to convert
> before accessing the target (like I detailed in my
> previous emails).
>
> In either case we also need a encoding conversion tool for each
> such database, since afaik we can not reliably autodetect
> the encoding of the stored data.
>
> In order to fix #1 though, option (b) is the only possible way.
>
> So my wish would be to convert all of samba to use UTF8
> internally (I'd be ready to discuss a different unicode
> charset like UTF16), and convert to CH_UNIX for the necessary
> communication interfaces with the outside.
>
> I hope this makes my argument a little clearer.
>
> Cheers - Michael
That's what I (and my friends) insisted several years ago:
http://lists.samba.org/archive/samba-technical/2004-March/034638.html
http://lists.samba.org/archive/samba-technical/2004-March/034742.html
Internal charset should be fixed. UTF-8 is acceptable but UTF-16 may
be better because UTF-16 is more suitable for string manipulation than
UTF-8.
---
TAKAHASHI Motonobu <monyo at monyo.com> / @damemonyo
http://damedame.monyo.com/ / http://facebook.com/monyot
More information about the samba-technical
mailing list