samba-3.0.0beta1 codeset issue on non-Linux
t.d.lee at durham.ac.uk
Fri Jun 13 09:00:49 GMT 2003
On Thu, 12 Jun 2003, Steve Langasek wrote:
> Specifically, to be useful to Samba, the system iconv must support some
> set of conversions to(from) UCS-2LE, from(to) charsets other than ASCII
> and UTF8. This is because Samba uses a two-step charset conversion
> process, with UCS-2LE as an intermediate encoding (chosen because it's
> pretty much guaranteed to support all characters that are also supported
> by Windows clients, Unicode or not). So the test should test for
> features that will actually be used, and the specific charset values
> chosen are indeed important: converting between UCS-2LE and ASCII isn't
> useful. Converting between UCS-2LE and CP850 definitely is.
> Support for bidirectional conversion is certainly needed for proper
> functioning. Whether it's necessary to test for bidirectional
> conversion in configure.in, I don't know; I doubt it's a major problem
> in practice.
> Though it's a useful test for seeing what's available, I think that's
> too much complexity to use in the distribution. The default unix
> charset, the default display charset, and the internal charset are all
> handled internally by Samba; it's only the default DOS charset that's
> missing. So assuming CP850 is really a reasonable default, checking for
> CP850<->UCS-2LE alone should be reasonable.
OK, using a Solaris 8 machine as a test-bed, as it seems to illustrate
some interesting points.
There is a user-level program "iconv(1)". This is a lot more convenient
for our discussion here than writing C.
As a charset newbie, I looked through the "/usr/lib/iconv subtree, to see
what might be available. I then used "iconv(1)" to verify these.
I saw things relating CP850<=>UTF8.
I also saw UTF8<=>UCS-2LE.
But nothing directly CP850<=>UCS-2LE. (Note my inclusion of the word
directly... can you see what's coming?)
Testing (on a trivial ASCII file so that I would expect output to be the
same as input) was (as expected):
1. iconv -f CP850 -t UCS-2LE:
=> Not supported CP850 to UCS-2LE
2. iconv -f CP850 -t UTF8
3. iconv -f UTF8 -t UCS-2LE
And finally piping them together:
iconv -f CP850 -t UTF-8 | iconv -f UTF-8 -t UCS-2LE
So, can this sample non-Linux OS do native iconv "CP850<=>UCS-2LE"?
Answer is both yes and no. "No": if it must be single stage; "yes": if
we allow two-stage.
Which begs further questions about whether we want (or perhaps need) to
allow multi-stage charset conversion.
But, for the moment, let's leave that to one side, and approach it from a
Suppose our sample OS... and remember that, at present, this seems to
include most OSes... Suppose our sample OS does not do native single-stage
What should "configure" do? Is this a major problem? Are we going to
refuse to configure samba on all Solaris, IRIX, *BSD, ...?? Surely not!
Samba 2.0.x and 2.2.x have worked happily on these. And Samba 3.0 seems
to be working 99.9% happily anyway.
In days gone by, Samba distributed its own codepages.
How about this solution:
1. Samba distribution continues to includes a bare minumum set of
codepages (e.g. "CP850<=>UCS-2LE").
2. "configure" tests for iconv(CP850<=>UCS-2LE).
3. "configure" tests for iconv(<something more obscure>).
4. (Possible, not essential!) Including the earlier discussion of
multi-stage, allow multi-stage iconv conversion in the samba code?
(And an associated "configure" test.)
> [...] but I bet there are lots of platforms out there which would
> need to have GNU iconv installed to take advantage of Samba charset
But are we going to _require_, as an essential precondition, that every
sys.admin. has installed the GNU version of "iconv"? I hope not (bearing
in mind that Samba 3.0 seems mostly OK anyway).
Hope that helps our thinking.
: David Lee I.T. Service :
: Systems Programmer Computer Centre :
: University of Durham :
: http://www.dur.ac.uk/t.d.lee/ South Road :
: Durham :
: Phone: +44 191 334 2752 U.K. :
More information about the samba-technical