i18n question.

Simo Sorce simo.sorce at xsec.it
Mon Mar 8 13:26:07 GMT 2004


On Mon, 2004-03-08 at 14:15, Benjamin Riefenstahl wrote:
> Hi Andrew,
> 
> 
> tridge at samba.org writes:
> > this argument doesn't convince me. As long as we stick to the
> > "rules" as to what we can assume about a internal charset then we
> > will be OK.
> 
> In the current system the rules are not a requirement on "us" but on
> the users' file systems out there.  And not all of them conform to
> those rules, that's a fact.
> 
> >  *) build a "charset translation" NTVFS module that can be used in
> >  those less common cases where you wish to use a different charset
> >  for some shares.
> 
> I like that approach.  You get your optimizations for filesystems that
> conform to your rules, but you can use the charset translation module
> for all others.
> 
> > On windows they have compiler support for wide characters but we
> > don't on unix.  Without that compiler support it is a nightmare
> > dealing with all of the string constants we have to deal with in
> > CIFS.
> 
> I understand that point.  OTOH, there are ways around that, so the
> "nightmare" part seems exaggerated to me.  Off the top of my head: Use
> a simple preprocessor, use global constants and a preprocessor or
> generator for the implementation file(s) of those strings, use a
> version of sprintf() that takes an ASCII string as format parameter,
> but generates UTF-16 (and similar for other functions).
> 
> Also, from a quick review, quite a considerable percentage of constant
> strings in Samba are not for exchange with SMB, but configuration
> options, error messages and other stuff that can just stay in ASCII.

NO, that would mean you have 2 "internal charests" to deal with ...
Actually we make a rule that all constants are _only_ ASCII and you can
guarantee all charset you use are 100% ASCII compatible.

But if you want to use UTF16 internally then you are not anymore ASCII
compatible and that mean you need to check each time you use a constant
and convert on the fly (you have 2 internal charsets then, that's a
nightmare), or make a preparsing something that translate every constant
into an UTF16 string. This mean lot of allocated space at runtime or
custom preprocessing scripts with costants defined outside the source
code file, that defeats the usefulness of many constant strings.

Simo.

-- 
Simo Sorce - simo.sorce at xsec.it
Xsec s.r.l. - http://www.xsec.it
via Garofalo, 39 - 20133 - Milano
mobile: +39 329 328 7702
tel. +39 02 2953 4143 - fax: +39 02 700 442 399


More information about the samba-technical mailing list