samba-3.0.0beta1 codeset issue on non-Linux

Alexander Bokovoy a.bokovoy at sam-solutions.net
Fri Jun 13 15:01:53 GMT 2003


On Fri, Jun 13, 2003 at 10:00:49AM +0100, David Lee wrote:
> > CP850<->UCS-2LE alone should be reasonable.
> 
> OK, using a Solaris 8 machine as a test-bed, as it seems to illustrate
> some interesting points.
> 
> There is a user-level program "iconv(1)".  This is a lot more convenient
> for our discussion here than writing C.
> 
> As a charset newbie, I looked through the "/usr/lib/iconv subtree, to see
> what might be available.  I then used "iconv(1)" to verify these.
> 
> I saw things relating CP850<=>UTF8.
> 
> I also saw UTF8<=>UCS-2LE.
> 
> But nothing directly CP850<=>UCS-2LE.  (Note my inclusion of the word
> directly... can you see what's coming?)

Sure. This is known situation. GNU iconv library and glibc handle many
charsets internally using more-than one-to-one conversion paths, they have
special graph describing weights of edges representing conversions and
iterate through that graph when there is no direct edge connecting two
vertexes representing charsets.

> So, can this sample non-Linux OS do native iconv "CP850<=>UCS-2LE"?
> Answer is both yes and no.  "No": if it must be single stage;  "yes": if
> we allow two-stage.
> 
> Which begs further questions about whether we want (or perhaps need) to
> allow multi-stage charset conversion.
GNU iconv library is really your friend in that case.

> 
> But, for the moment, let's leave that to one side, and approach it from a
> different angle.
> 
> Suppose our sample OS... and remember that, at present, this seems to
> include most OSes... Suppose our sample OS does not do native single-stage
> iconv "CP850<=>UCS-2LE".
> 
> What should "configure" do?  Is this a major problem?  Are we going to
> refuse to configure samba on all Solaris, IRIX, *BSD, ...??  Surely not!
> Samba 2.0.x and 2.2.x have worked happily on these.  And Samba 3.0 seems
> to be working 99.9% happily anyway.
> 
> In days gone by, Samba distributed its own codepages.
> 
> How about this solution:
> 
> 1. Samba distribution continues to includes a bare minumum set of
>    codepages (e.g. "CP850<=>UCS-2LE").
> 
> 2. "configure" tests for iconv(CP850<=>UCS-2LE).
> 
> 3. "configure" tests for iconv(<something more obscure>).
> 
> 4. (Possible, not essential!) Including the earlier discussion of
>    multi-stage, allow multi-stage iconv conversion in the samba code?
>    (And an associated "configure" test.)
I'm not sure about (4). This would lead to  duplication of much of code
with GNU iconv and glibc because multi-stage conversion is a graph problem
('find a short way between two cities').

> >    [...] but I bet there are lots of platforms out there which would
> > need to have GNU iconv installed to take advantage of Samba charset
> > support.
> 
> But are we going to _require_, as an essential precondition, that every
> sys.admin. has installed the GNU version of "iconv"?  I hope not (bearing
> in mind that Samba 3.0 seems mostly OK anyway).
Why do you not like to work with GNU iconv library?

-- 
/ Alexander Bokovoy
Misery no longer loves company.  Nowadays it insists on it.
		-- Russell Baker



More information about the samba-technical mailing list