samba-3.0.0beta1 codeset issue on non-Linux
Alexander Bokovoy
a.bokovoy at sam-solutions.net
Fri Jun 13 15:01:53 GMT 2003
On Fri, Jun 13, 2003 at 10:00:49AM +0100, David Lee wrote:
> > CP850<->UCS-2LE alone should be reasonable.
>
> OK, using a Solaris 8 machine as a test-bed, as it seems to illustrate
> some interesting points.
>
> There is a user-level program "iconv(1)". This is a lot more convenient
> for our discussion here than writing C.
>
> As a charset newbie, I looked through the "/usr/lib/iconv subtree, to see
> what might be available. I then used "iconv(1)" to verify these.
>
> I saw things relating CP850<=>UTF8.
>
> I also saw UTF8<=>UCS-2LE.
>
> But nothing directly CP850<=>UCS-2LE. (Note my inclusion of the word
> directly... can you see what's coming?)
Sure. This is known situation. GNU iconv library and glibc handle many
charsets internally using more-than one-to-one conversion paths, they have
special graph describing weights of edges representing conversions and
iterate through that graph when there is no direct edge connecting two
vertexes representing charsets.
> So, can this sample non-Linux OS do native iconv "CP850<=>UCS-2LE"?
> Answer is both yes and no. "No": if it must be single stage; "yes": if
> we allow two-stage.
>
> Which begs further questions about whether we want (or perhaps need) to
> allow multi-stage charset conversion.
GNU iconv library is really your friend in that case.
>
> But, for the moment, let's leave that to one side, and approach it from a
> different angle.
>
> Suppose our sample OS... and remember that, at present, this seems to
> include most OSes... Suppose our sample OS does not do native single-stage
> iconv "CP850<=>UCS-2LE".
>
> What should "configure" do? Is this a major problem? Are we going to
> refuse to configure samba on all Solaris, IRIX, *BSD, ...?? Surely not!
> Samba 2.0.x and 2.2.x have worked happily on these. And Samba 3.0 seems
> to be working 99.9% happily anyway.
>
> In days gone by, Samba distributed its own codepages.
>
> How about this solution:
>
> 1. Samba distribution continues to includes a bare minumum set of
> codepages (e.g. "CP850<=>UCS-2LE").
>
> 2. "configure" tests for iconv(CP850<=>UCS-2LE).
>
> 3. "configure" tests for iconv(<something more obscure>).
>
> 4. (Possible, not essential!) Including the earlier discussion of
> multi-stage, allow multi-stage iconv conversion in the samba code?
> (And an associated "configure" test.)
I'm not sure about (4). This would lead to duplication of much of code
with GNU iconv and glibc because multi-stage conversion is a graph problem
('find a short way between two cities').
> > [...] but I bet there are lots of platforms out there which would
> > need to have GNU iconv installed to take advantage of Samba charset
> > support.
>
> But are we going to _require_, as an essential precondition, that every
> sys.admin. has installed the GNU version of "iconv"? I hope not (bearing
> in mind that Samba 3.0 seems mostly OK anyway).
Why do you not like to work with GNU iconv library?
--
/ Alexander Bokovoy
Misery no longer loves company. Nowadays it insists on it.
-- Russell Baker
More information about the samba-technical
mailing list