CH_DISPLAY and gettext

Andrew Bartlett abartlet at samba.org
Mon Jun 20 23:02:13 MDT 2011


I've been looking closely at the implementation of internationalisation
in Samba, and I'm rather confused about how it is expected to work
except in a UTF8 locale.

As a background, we have two internationalisation mechanisms in Samba:
 - libintl
 - lang_tdb

'net' and 'pam_wibnindd' are internationalised with libintl/gettext,
with .mo files being installed as part of make install (except in the
waf builds - a bug). 

SWAT is internationalised using .msg files which are converted into
lang_XX.tdb files in lockdir at runtime.  lang_msg() is the interface to
obtain these strings.  d_printf() isn't used much in SWAT, and not in
combination with lang_msg() as far s I can see. 

Finally, most of Samba uses d_printf(), which causes strings to be
converted from UTF8 (the source format) to CH_DISPLAY.

My concern is about the combination of these two elements.  When a
string is internationalised into (say) German, the messages are placed
in a .mo file as UTF8.  

When we read file-names to display from a remote server however, these
strings are in unix charset. 

Then, when we d_printf() these strings, we convert them into CH_DISPLAY,
based on the system locale or the LANG environment variable.

The trouble is, what is the source charset, where CH_DISPLAY is not
CH_UNIX? 
 
(1) We could say that the source charset is UTF8, in which case UNIX
charset filenames would be wrong. 
(2) We could say the source charset is UNIX, but then the gettext
message will be wrong.
(3) We could assume that the internationalised message format string is
DISPLAY, but the arguments are UNIX, but then we would have to 'ban'
using N_() to translate %s arguments.  Hopefully we never put a unix
(not C) string directly into d_printf() in this case. 

Perhaps someone with a longer background in this area might be able to
help me untangle this mess?

In my patches to use a common d_printf() I originally implemented (1),
but attach a patch to fix that up, and one to do (2).  I attempted (and
failed) to implement (3).

Or, is it simply not practical to actually have 'display
charset'/CH_DISPLAY not equal 'unix charset'/CH_UNIX, and we should
simply remove the parameter?

Thanks,

Andrew Bartlett
-- 
Andrew Bartlett                                http://samba.org/~abartlet/
Authentication Developer, Samba Team           http://samba.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-lib-util-Restore-CH_UNIX-as-source-charset-for-d_pri.patch
Type: text/x-patch
Size: 1291 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20110621/b7e7fb09/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-lib-util-Remove-display_cd-from-d_printf.patch
Type: text/x-patch
Size: 1115 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20110621/b7e7fb09/attachment-0001.bin>


More information about the samba-technical mailing list