samba-3.0.0beta1 codeset issue on non-Linux

Thu Jun 12 09:51:02 GMT 2003

On Wed, 11 Jun 2003, Steve Langasek wrote:

> On Wed, Jun 11, 2003 at 04:06:21PM +0100, David Lee wrote:
>
> > 2. The AC_TRY_RUN test in "configure.in" is based upon:
> >       iconv_open("ASCII", "UCS-2LE");
> >    But Solaris (at least), while having support for codeset conversions,
> >    seems not to have any involving "ASCII".  There are plenty involving
> >    "UCS-2LE" (and "UTF-8") and a small number involving "CP850"
> >    So, based on this one limited test, "configure" erroneously concludes
> >    that it cannot do chareset translation: clearly wrong.
>
> Hmm, this is a detail that I missed on first reading.  This autoconf
> test really doesn't seem to belong, because on my systems, I see that
> Samba handles the conversion between UCS-2LE and ASCII *internally*,
> even though the system iconv supports it!  One way or another this test
> needs to be changed, because it's not testing for the features that are
> actually used.  Changing it to iconv_open("CP850", "UCS-2LE") would make
> for a much more useful better test.

Thanks for your replies, Steve.

Before answering the detail, can I ask a more general question?  What is
the _aim_ of this run-time "iconv_open(<X1>, <X2>)" test in configure.in?

Is it to establish merely that there is some sort of iconv which can do
some translation function?  That there is at least one <X1> and one <X2>
for which this works, with the actual values of <X1> and <X2> at this
stage being unimportant?

Or is it more than that: that such an iconv which can handle a known,
particular set of translations?  That we require, not only "iconv_open()"
but also certain specific values of <X1> and <X2>.

> It is required that the system iconv be able to support UCS-2LE, since
> that's used internally by Samba as an intermediate encoding.  It just
> doesn't make sense to test for ASCII, which Samba already knows how to
> handle.

I suspect that begins to answer my earlier, fumbling, question about the
principle: it means we require a working iconv_open() which is known to
support "UCS-2LE".  (Is that "from" or "to" or both?)

> > o  Is "CP850" still the correct default for "param/loadparm.c"?
>
> > o  Should "configure.in" try harder, using a variety of charsets,
> >    including at least "ASCII", "UCS-2LE", "UTF-8", "CP850"?
>
> Out of these, only the CP850<->UCS-2LE conversion would actually be
> handed off to system iconv by Samba.
>
> Can you confirm that changing the autoconf test to look for CP850
> instead of ASCII fixes the problem on Solaris?

Does this mean that we _require_ a fully functional:
   iconv_open("CP850", "UCS-2LE")
and also:
   iconv_open("UCS-2LE", "CP850")

I had earlier just hacked "configure.in" to test a range a "from" and "to"
charsets:
   FROM="CP850 850 646"
   TO="8859 UTF-8 UCS-2LE"
   <foreach FROM>
     <foreach TO>
       run the "iconv_open()" and print succeed/fail
     <>
   <>

   # THis next block is simply "from" and "to" reversed
   <foreach TO>      # now used in "from" position
     <foreach FROM>  # now used in "to" position
       run test and print succeed/fail
     <>
   <>

I chose these arbitrarily, but partly guided by an ancient Solaris 2.5.1
system we had lying around, which didn't seem to have anything 850-ish but
did seem to have a 646.  (Remember, I haven't a clue what these things
mean!)

The results:

Solaris 2.5.1
   (Sadly, so ancient that our environemnt has moved on, and I can no
   longer build samba.  Suspect no 850-ish.)

Solaris 7:
   fail: CP850 8859
   fail: CP850 UTF-8
   fail: CP850 UCS-2LE
   fail: 850 8859
   fail: 850 UTF-8
   fail: 850 UCS-2LE
   succeed: : 646 8859
   succeed: : 646 UTF-8
   fail: 646 UCS-2LE

   fail: 8859 CP850
   fail: 8859 850
   succeed: : 8859 646
   fail: UTF-8 CP850
   fail: UTF-8 850
   succeed: : UTF-8 646
   fail: UCS-2LE CP850
   fail: UCS-2LE 850
   fail: UCS-2LE 646

Comment: apparently no 850-ish functionality.

Solaris 8:
   fail: CP850 8859
   succeed: : CP850 UTF-8
   fail: CP850 UCS-2LE
   fail: 850 8859
   fail: 850 UTF-8
   fail: 850 UCS-2LE
   succeed: : 646 8859
   succeed: : 646 UTF-8
   succeed: : 646 UCS-2LE

   fail: 8859 CP850
   fail: 8859 850
   succeed: : 8859 646
   succeed: : UTF-8 CP850
   fail: UTF-8 850
   succeed: : UTF-8 646
   fail: UCS-2LE CP850
   fail: UCS-2LE 850
   succeed: : UCS-2LE 646

Comment: have CP850<->UTF-8 but not CP850<->UCS-2LE

Redhat 9:
   fail: CP850 8859
   succeed: : CP850 UTF-8
   succeed: : CP850 UCS-2LE
   fail: 850 8859
   succeed: : 850 UTF-8
   succeed: : 850 UCS-2LE
   fail: 646 8859
   fail: 646 UTF-8
   fail: 646 UCS-2LE

   fail: 8859 CP850
   fail: 8859 850
   fail: 8859 646
   succeed: : UTF-8 CP850
   succeed: : UTF-8 850
   fail: UTF-8 646
   succeed: : UCS-2LE CP850
   succeed: : UCS-2LE 850
   fail: UCS-2LE 646

And what about IRIX, HPUX, *BSD flavours?

Does that help?

What tests should "configure.in" do?  What should it do with these
results?  How severe are the various results to be?  What does the rest of
the code (compile time and run-time) need to do with these results?

-- 

:  David Lee                                I.T. Service          :
:  Systems Programmer                       Computer Centre       :
:                                           University of Durham  :
:  http://www.dur.ac.uk/t.d.lee/            South Road            :
:                                           Durham                :
:  Phone: +44 191 334 2752                  U.K.                  :