[Samba] Does tdb can store incorrect encoding symbols?

Tue Jan 12 13:21:18 MST 2010

Hello, All!

I found that Samba checks description fields encoding at various tdb files to late.

Case 1. 
FreeBSD with UTF-8 support. At some cases FreeBSD's adduser script can save incorrect UTF-8 sequence to GECOS /etc/passwd field. (Here is example http://www.acc.tula.ru/~acc107_3/samba/miscoding/gecos.txt)
When user is added to Samba, Samba reads GECOS field "as is" without check in.
Later it will lead to "Conversion error: Illegal multibyte sequence". 
At this part of log http://www.acc.tula.ru/~acc107_3/samba/miscoding/case%201.txt user 'skvorco' have problem description.
Workaround: pdbedit -u user_name -f "Correct full name"

Case 2.
A many years ago I had used Samba with one-byte encoding - the KOI-8R - for presenting russian letters. At that time I created first groups, which haved the russian descriptions.
Later I move to new version of Samba (and OS so) which supports multi-byte encoding. Today samba's logs analysis show me that the description of those first groups is still one-byte encoding!!! 
At this part of log http://www.acc.tula.ru/~acc107_3/samba/miscoding/case%202.txt a group 'tst-users' have one-byte encoded description. 'ugs' group haves multi-byte encoding (UTF-8) description.
Because of one-byte encoded description I get error:
lib/charcnv.c:convert_string_internal(263)
  convert_string_internal: Conversion error: Illegal multibyte sequence(лПОЖЕТЕОГЙС)
librpc/ndr/ndr.c:ndr_push_error(493)
  ndr_push_error(5): Bad char conversion
rpc_server/srv_pipe.c:api_rpcTNP(2381)
  api_rpcTNP: samr: SAMR_QUERYDISPLAYINFO2 failed.

and empty domain group list.

Fragment of smb.conf
        dos charset = 866
        unix charset = utf-8
        preserve case = yes
        short preserve case = yes
        default case = lower
        case sensitive = auto

So, I have two questions:
1. How to select one-byte encoded descriptions?
2. How to convert it to multi-byte encoding?

Bye.Serg.