samba-3.0.0beta1 codeset issue on non-Linux

Tue Jun 24 14:13:15 GMT 2003

On Mon, 16 Jun 2003, Steve Langasek wrote:

> David's tests with Solaris iconv seem to suggest otherwise:
> CP850<->UTF8 is supported, but CP850<->UCS-2LE is not, and this seems to
> be a problem in light of this code from lib/charcnv.c --
>
> #ifdef HAVE_NATIVE_ICONV
>         if (!ret->pull) {
>                 ret->cd_pull = iconv_open("UCS-2LE", fromcode);
>                 if (ret->cd_pull != (iconv_t)-1)
>                         ret->pull = sys_iconv;
>         }
>
>         if (!ret->push) {
>                 ret->cd_push = iconv_open(tocode, "UCS-2LE");
>                 if (ret->cd_push != (iconv_t)-1)
>                         ret->push = sys_iconv;
>         }
> #endif
>
> If I'm not mistaken, this means native iconv is only ever used for
> charsets that have converters to UCS-2LE.  So while GNU iconv should not
> be required, the use of UCS-2LE as the meet-point for charset
> conversions seems to cramp the portability of Samba to those iconv
> implementations that are actually available.

We seem to have gone quiet on this thread.  And I lack the experience with
codesets to be able to lead it.  But we need to pursue it, don't we?

Are the following correct:

1. The issue is important?

2. Making GNU's iconv mandatory (absolute requirement) is undesirable?

3. Using a system-installed iconv is highly desirable?  (For special-case
   Linux, this is probably the GNU version; for several non-Linux sites
   GNU/iconv may be already installed, possibly in addition to (not
   replacing) OS/iconv.)

4. We must be able to end up with 850<->UCS-2LE?  (Has the "850" got to
   be exactly "CP850"?  Is a thing called "IBM-850" equivalent and/or
   acceptable?)

5. Are there close-equivalents to 850 that would suffice?

6. Are there countries/locales/etc. which would require an 850-like thing
   that is not 850?  (I've seen lots of "IBM-8nn" and lots of "CPnnn"
   entries on one sample OS (Solaris).)

7. Am I asking the right questions???

It seems that many OSes cannot do a *direct* CP850<->UCS-2LE.  But many of
those would seem able to do this as a two-stage process, although "CP850"
might be called something like "IBM-850".

So it sounds as it we want "configure" to determine:

1. The name of an 850-like charset that is available;
   a)  anything else?
   b)  preferred name-search order? (e.g "CP850" "IBM850" "IBM-850" "850")

2. A means by which that 850 can be converted to and from UCS-2LE in the
   following preference order:
   a) GNU/iconv;
   b) OS/iconv direct  (for Linux special-case "a" == "b");
   c) OS/iconv two stage (perhaps we should design for n-stage??)

This information then needs to be passed (via "#define"s) into the samba
source code (a) to set the global charset (b) to choose an appropriate
codepath for the translation in "lib/charcnv.c" etc.

Is that about right?

-- 

:  David Lee                                I.T. Service          :
:  Systems Programmer                       Computer Centre       :
:                                           University of Durham  :
:  http://www.dur.ac.uk/t.d.lee/            South Road            :
:                                           Durham                :
:  Phone: +44 191 334 2752                  U.K.                  :