samba-3.0.0beta1 codeset issue on non-Linux

David Lee t.d.lee at durham.ac.uk
Fri Jun 13 09:00:49 GMT 2003


On Thu, 12 Jun 2003, Steve Langasek wrote:

> Specifically, to be useful to Samba, the system iconv must support some
> set of conversions to(from) UCS-2LE, from(to) charsets other than ASCII
> and UTF8.  This is because Samba uses a two-step charset conversion
> process, with UCS-2LE as an intermediate encoding (chosen because it's
> pretty much guaranteed to support all characters that are also supported
> by Windows clients, Unicode or not).  So the test should test for
> features that will actually be used, and the specific charset values
> chosen are indeed important: converting between UCS-2LE and ASCII isn't
> useful.  Converting between UCS-2LE and CP850 definitely is.
>
> [...]
> Support for bidirectional conversion is certainly needed for proper
> functioning.  Whether it's necessary to test for bidirectional
> conversion in configure.in, I don't know; I doubt it's a major problem
> in practice.
>
> [...]
> Though it's a useful test for seeing what's available, I think that's
> too much complexity to use in the distribution.  The default unix
> charset, the default display charset, and the internal charset are all
> handled internally by Samba; it's only the default DOS charset that's
> missing.  So assuming CP850 is really a reasonable default, checking for
> CP850<->UCS-2LE alone should be reasonable.

OK, using a Solaris 8 machine as a test-bed, as it seems to illustrate
some interesting points.

There is a user-level program "iconv(1)".  This is a lot more convenient
for our discussion here than writing C.

As a charset newbie, I looked through the "/usr/lib/iconv subtree, to see
what might be available.  I then used "iconv(1)" to verify these.

I saw things relating CP850<=>UTF8.

I also saw UTF8<=>UCS-2LE.

But nothing directly CP850<=>UCS-2LE.  (Note my inclusion of the word
directly... can you see what's coming?)

Testing (on a trivial ASCII file so that I would expect output to be the
same as input) was (as expected):

1. iconv -f CP850 -t UCS-2LE:
     => Not supported CP850 to UCS-2LE

2. iconv -f CP850 -t UTF8
     => translation.

3. iconv -f UTF8 -t UCS-2LE
     => translation

And finally piping them together:
   iconv -f CP850 -t UTF-8  | iconv -f UTF-8  -t UCS-2LE
     => translation


So, can this sample non-Linux OS do native iconv "CP850<=>UCS-2LE"?
Answer is both yes and no.  "No": if it must be single stage;  "yes": if
we allow two-stage.

Which begs further questions about whether we want (or perhaps need) to
allow multi-stage charset conversion.



But, for the moment, let's leave that to one side, and approach it from a
different angle.

Suppose our sample OS... and remember that, at present, this seems to
include most OSes... Suppose our sample OS does not do native single-stage
iconv "CP850<=>UCS-2LE".

What should "configure" do?  Is this a major problem?  Are we going to
refuse to configure samba on all Solaris, IRIX, *BSD, ...??  Surely not!
Samba 2.0.x and 2.2.x have worked happily on these.  And Samba 3.0 seems
to be working 99.9% happily anyway.

In days gone by, Samba distributed its own codepages.

How about this solution:

1. Samba distribution continues to includes a bare minumum set of
   codepages (e.g. "CP850<=>UCS-2LE").

2. "configure" tests for iconv(CP850<=>UCS-2LE).

3. "configure" tests for iconv(<something more obscure>).

4. (Possible, not essential!) Including the earlier discussion of
   multi-stage, allow multi-stage iconv conversion in the samba code?
   (And an associated "configure" test.)

>    [...] but I bet there are lots of platforms out there which would
> need to have GNU iconv installed to take advantage of Samba charset
> support.

But are we going to _require_, as an essential precondition, that every
sys.admin. has installed the GNU version of "iconv"?  I hope not (bearing
in mind that Samba 3.0 seems mostly OK anyway).

Hope that helps our thinking.

-- 

:  David Lee                                I.T. Service          :
:  Systems Programmer                       Computer Centre       :
:                                           University of Durham  :
:  http://www.dur.ac.uk/t.d.lee/            South Road            :
:                                           Durham                :
:  Phone: +44 191 334 2752                  U.K.                  :



More information about the samba-technical mailing list