LDAP and UTF-8

Ihar Viarheichyk i.viarheichyk at sam-solutions.net
Mon Feb 18 03:02:17 GMT 2002


On Sun, Feb 17, 2002 at 04:45:38PM +0100, Juergen Hasch wrote:
> Would someone comment on 
> a) is there another (better) way to convert from UTF-8 to Unix than
> using convert_string(...) and adding the pull_ and push_ wrapper functions ?

No. I don't think you can find the way to convert UTF-8 to _any_ supported 
unix codepage better that iconv.

> b) what is the difference between CH_UNIX and CH_DISPLAY, when should I
> convert to CH_DISPLAY instead of CH_UNIX ?

CH_DISPLAY is used in programs which display information to user (e.g.
smbclient).

> c) are there platforms without iconv, so this will possibly break things ?

You need iconv to support internationalization properly (though, samba
can convert UCS2<->UTF-8 without iconv).  In fact, libiconv exists on
many platforms.


Patch is good. I have just two comments on the patch: 

> +	if (!conv_handles[CH_UNIX][CH_UTF8]) {
> +		conv_handles[CH_UNIX][CH_UTF8] = smb_iconv_open("UTF-8", "ASCII");
> +	}
> +	if (!conv_handles[CH_UTF8][CH_UNIX]) {
> +		conv_handles[CH_UTF8][CH_UNIX] = smb_iconv_open("ASCII", "UTF-8");
> +	}

I don't see much sence in ASCII<->UTF-8 conversion as first 127 symbol
part of UTF-8 is equal to ASCII. 

> +/****************************************************************************
> +copy a string from a char* src to a unicode destination
> +return the number of bytes occupied by the string in the destination
> +flags can have:
> +  STR_TERMINATE means include the null termination
> +  STR_UPPER     means uppercase in the destination
> +dest_len is the maximum length allowed in the destination. If dest_len
> +is -1 then no maxiumum is used
> +****************************************************************************/
> +int push_utf8(const void *base_ptr, void *dest, const char *src, int dest_len, int flags)
> +{
> +	int src_len = strlen(src);
> +	pstring tmpbuf;
> +
> +	/* treat a pstring as "unlimited" length */
> +	if (dest_len == -1) {
> +		dest_len = sizeof(pstring);
> +	}
> +
> +	if (flags & STR_UPPER) {
> +		pstrcpy(tmpbuf, src);
> +		strupper(tmpbuf);
> +		src = tmpbuf;
> +	}
> +
> +	if (flags & STR_TERMINATE) {
> +		src_len++;
> +	}
> +
> +	return convert_string(CH_UNIX, CH_UTF8, src, src_len, dest, dest_len);
> +}
>  
>  /****************************************************************************
>  copy a string from a ucs2 source to a unix char* destination
> @@ -435,6 +474,40 @@
>  	return pull_ucs2(NULL, dest, src, sizeof(fstring), -1, STR_TERMINATE);
>  }
>  
> +/****************************************************************************
> +copy a string from a utf-8 source to a unix char* destination
> +flags can have:
> +  STR_TERMINATE means the string in src is null terminated
> +if STR_TERMINATE is set then src_len is ignored
> +src_len is the length of the source area in bytes
> +return the number of bytes occupied by the string in src
> +the resulting string in "dest" is always null terminated
> +****************************************************************************/
> +int pull_utf8(const void *base_ptr, char *dest, const void *src, int dest_len, int src_len, int flags)
> +{
> +	int ret;
> +
> +	if (dest_len == -1) {
> +		dest_len = sizeof(pstring);
> +	}
> +
> +	if (flags & STR_TERMINATE) src_len = strlen(src)+1;
> +
> +	ret = convert_string(CH_UTF8, CH_UNIX, src, src_len, dest, dest_len);
> +	if (dest_len) dest[MIN(ret, dest_len-1)] = 0;
> +
> +	return src_len;
> +}
> +
> +int pull_utf8_pstring(char *dest, const void *src)
> +{
> +	return pull_utf8(NULL, dest, src, sizeof(pstring), -1, STR_TERMINATE);
> +}
> +
> +int pull_utf8_fstring(char *dest, const void *src)
> +{
> +	return pull_utf8(NULL, dest, src, sizeof(fstring), -1, STR_TERMINATE);
> +}
> 
>  /****************************************************************************
>  copy a string from a char* src to a unicode or ascii
> 

Do you really need such complicated functions? I don't think you need
STR_UPPER and explicit STR_TERMINATE for usernames. So
(push|pull)_utf8_pstring will be enough.

-- 
Igor Vergeichik
ICQ 47298730





More information about the samba-technical mailing list