samba-3.0.0beta1 codeset issue on non-Linux
David Lee
t.d.lee at durham.ac.uk
Tue Jun 24 14:13:15 GMT 2003
On Mon, 16 Jun 2003, Steve Langasek wrote:
> David's tests with Solaris iconv seem to suggest otherwise:
> CP850<->UTF8 is supported, but CP850<->UCS-2LE is not, and this seems to
> be a problem in light of this code from lib/charcnv.c --
>
> #ifdef HAVE_NATIVE_ICONV
> if (!ret->pull) {
> ret->cd_pull = iconv_open("UCS-2LE", fromcode);
> if (ret->cd_pull != (iconv_t)-1)
> ret->pull = sys_iconv;
> }
>
> if (!ret->push) {
> ret->cd_push = iconv_open(tocode, "UCS-2LE");
> if (ret->cd_push != (iconv_t)-1)
> ret->push = sys_iconv;
> }
> #endif
>
> If I'm not mistaken, this means native iconv is only ever used for
> charsets that have converters to UCS-2LE. So while GNU iconv should not
> be required, the use of UCS-2LE as the meet-point for charset
> conversions seems to cramp the portability of Samba to those iconv
> implementations that are actually available.
We seem to have gone quiet on this thread. And I lack the experience with
codesets to be able to lead it. But we need to pursue it, don't we?
Are the following correct:
1. The issue is important?
2. Making GNU's iconv mandatory (absolute requirement) is undesirable?
3. Using a system-installed iconv is highly desirable? (For special-case
Linux, this is probably the GNU version; for several non-Linux sites
GNU/iconv may be already installed, possibly in addition to (not
replacing) OS/iconv.)
4. We must be able to end up with 850<->UCS-2LE? (Has the "850" got to
be exactly "CP850"? Is a thing called "IBM-850" equivalent and/or
acceptable?)
5. Are there close-equivalents to 850 that would suffice?
6. Are there countries/locales/etc. which would require an 850-like thing
that is not 850? (I've seen lots of "IBM-8nn" and lots of "CPnnn"
entries on one sample OS (Solaris).)
7. Am I asking the right questions???
It seems that many OSes cannot do a *direct* CP850<->UCS-2LE. But many of
those would seem able to do this as a two-stage process, although "CP850"
might be called something like "IBM-850".
So it sounds as it we want "configure" to determine:
1. The name of an 850-like charset that is available;
a) anything else?
b) preferred name-search order? (e.g "CP850" "IBM850" "IBM-850" "850")
2. A means by which that 850 can be converted to and from UCS-2LE in the
following preference order:
a) GNU/iconv;
b) OS/iconv direct (for Linux special-case "a" == "b");
c) OS/iconv two stage (perhaps we should design for n-stage??)
This information then needs to be passed (via "#define"s) into the samba
source code (a) to set the global charset (b) to choose an appropriate
codepath for the translation in "lib/charcnv.c" etc.
Is that about right?
--
: David Lee I.T. Service :
: Systems Programmer Computer Centre :
: University of Durham :
: http://www.dur.ac.uk/t.d.lee/ South Road :
: Durham :
: Phone: +44 191 334 2752 U.K. :
More information about the samba-technical
mailing list