LDAP and UTF-8
Ihar Viarheichyk
i.viarheichyk at sam-solutions.net
Mon Feb 18 03:02:17 GMT 2002
On Sun, Feb 17, 2002 at 04:45:38PM +0100, Juergen Hasch wrote:
> Would someone comment on
> a) is there another (better) way to convert from UTF-8 to Unix than
> using convert_string(...) and adding the pull_ and push_ wrapper functions ?
No. I don't think you can find the way to convert UTF-8 to _any_ supported
unix codepage better that iconv.
> b) what is the difference between CH_UNIX and CH_DISPLAY, when should I
> convert to CH_DISPLAY instead of CH_UNIX ?
CH_DISPLAY is used in programs which display information to user (e.g.
smbclient).
> c) are there platforms without iconv, so this will possibly break things ?
You need iconv to support internationalization properly (though, samba
can convert UCS2<->UTF-8 without iconv). In fact, libiconv exists on
many platforms.
Patch is good. I have just two comments on the patch:
> + if (!conv_handles[CH_UNIX][CH_UTF8]) {
> + conv_handles[CH_UNIX][CH_UTF8] = smb_iconv_open("UTF-8", "ASCII");
> + }
> + if (!conv_handles[CH_UTF8][CH_UNIX]) {
> + conv_handles[CH_UTF8][CH_UNIX] = smb_iconv_open("ASCII", "UTF-8");
> + }
I don't see much sence in ASCII<->UTF-8 conversion as first 127 symbol
part of UTF-8 is equal to ASCII.
> +/****************************************************************************
> +copy a string from a char* src to a unicode destination
> +return the number of bytes occupied by the string in the destination
> +flags can have:
> + STR_TERMINATE means include the null termination
> + STR_UPPER means uppercase in the destination
> +dest_len is the maximum length allowed in the destination. If dest_len
> +is -1 then no maxiumum is used
> +****************************************************************************/
> +int push_utf8(const void *base_ptr, void *dest, const char *src, int dest_len, int flags)
> +{
> + int src_len = strlen(src);
> + pstring tmpbuf;
> +
> + /* treat a pstring as "unlimited" length */
> + if (dest_len == -1) {
> + dest_len = sizeof(pstring);
> + }
> +
> + if (flags & STR_UPPER) {
> + pstrcpy(tmpbuf, src);
> + strupper(tmpbuf);
> + src = tmpbuf;
> + }
> +
> + if (flags & STR_TERMINATE) {
> + src_len++;
> + }
> +
> + return convert_string(CH_UNIX, CH_UTF8, src, src_len, dest, dest_len);
> +}
>
> /****************************************************************************
> copy a string from a ucs2 source to a unix char* destination
> @@ -435,6 +474,40 @@
> return pull_ucs2(NULL, dest, src, sizeof(fstring), -1, STR_TERMINATE);
> }
>
> +/****************************************************************************
> +copy a string from a utf-8 source to a unix char* destination
> +flags can have:
> + STR_TERMINATE means the string in src is null terminated
> +if STR_TERMINATE is set then src_len is ignored
> +src_len is the length of the source area in bytes
> +return the number of bytes occupied by the string in src
> +the resulting string in "dest" is always null terminated
> +****************************************************************************/
> +int pull_utf8(const void *base_ptr, char *dest, const void *src, int dest_len, int src_len, int flags)
> +{
> + int ret;
> +
> + if (dest_len == -1) {
> + dest_len = sizeof(pstring);
> + }
> +
> + if (flags & STR_TERMINATE) src_len = strlen(src)+1;
> +
> + ret = convert_string(CH_UTF8, CH_UNIX, src, src_len, dest, dest_len);
> + if (dest_len) dest[MIN(ret, dest_len-1)] = 0;
> +
> + return src_len;
> +}
> +
> +int pull_utf8_pstring(char *dest, const void *src)
> +{
> + return pull_utf8(NULL, dest, src, sizeof(pstring), -1, STR_TERMINATE);
> +}
> +
> +int pull_utf8_fstring(char *dest, const void *src)
> +{
> + return pull_utf8(NULL, dest, src, sizeof(fstring), -1, STR_TERMINATE);
> +}
>
> /****************************************************************************
> copy a string from a char* src to a unicode or ascii
>
Do you really need such complicated functions? I don't think you need
STR_UPPER and explicit STR_TERMINATE for usernames. So
(push|pull)_utf8_pstring will be enough.
--
Igor Vergeichik
ICQ 47298730
More information about the samba-technical
mailing list