How does smbclient know what character sets to use?

Michael B Allen mba2000 at ioplex.com
Thu Aug 12 01:50:39 GMT 2004


David Wuertele said:
> Is there a standard character set used by the SMB protocol?

No. Newer CIFS implementations use the Unicode charset encoded in UCS-2LE
or UTF-16LE. Otherwise the charset is an 8 bit Windows charset determined
by the host locale (e.g. I think ISO-8859-1 might be like cp1252 on
Windows98 which isn't quite identical).

> If not, how does smbclient determine what character set to translate
> into the local host's character set?

Unfortunately the charset isn't negotiated. The client and server simply
have to be configured to agree on which charset to use. So ideally you
really want to configure the server and client to use Unicode.
Unfortunately many folks are using specific encodings for filenames which
makes it more difficult to move entirely to Unicode (e.g. Japanese).

> Also, how does smbclient discover what the local host's character set
> is?  Is that what LANG is for?

I don't actually know how smbclient determines the charset but using
setlocale("") to pick up LANG would still not be perfect considering the
charsets found on Unix/Linux may not be compatible with charsets in use on
a Windows server.

If I had to guess, I'd check to see if it obeys the charset options in
smb.conf.

Mike


More information about the samba-technical mailing list