[Samba] Languages and encoding: file system and file contents

tlaronde at kergis.com tlaronde at kergis.com
Mon Feb 5 12:18:00 UTC 2024


I'm rather unclear about the way CIFS/Samba deal with languages,
encoding, and the, perhaps, encoding of pathnames vs the encoding of
files considered by MS Windows to be text file and hence with, perhaps
a language and an encoding.

So I will try to formulate questions about the elementary points, and
I will be grateful to the ones who can share lights about these:

"dos charset" and "unix charset" are global parameters. As far as I
understand the description, this:
	a) fixes the _contents_ of files;
	b) the parameters are global so it is not possible to have
different values for different shares and/or different users.

Context: I imagine a Unix filesystem served to Windows clients.

So the questions:

1) These parameters don't seem to have anything to do with pathnames,
but setting "unix charset" to ASCII, not ascii pathnames are not
displayed on the clients. What is the relationship between charsets
and the pathnames?

2) What is the relationship between the language parameters (LC_* and
LANG) settings for the user, and the "charsets" defined in smb.conf? Is
the localisation encoding, for a Unix user mapped to a Samba user, used
in anyway to inform a Windows client about the language and encoding
of the contents?

3) If a MS Windows client connects to a share via another user (say a
Unix one), if the encodings on the MS Windows is different from what
is defined for the user connecting, is there a problem? (Ex.: Windows
is configured to use latin9 or equivalent; user used to connect is
declared as using UTF-8; what encoding will be used by a Windows
program? latin9 or UTF-8---I'm not talking about what will be stored,
I'm talking about what the Windows program, on Windows, is using:
Windows user encoding or encoding of the user making the share
connection?

4) Same question about the pathnames?

5) If a MS Windows program creates temporary filenames that use,
hopefully, only ASCII chars, if the Unix encoding is not ASCII
compatible, does this lead to problems or are the pathnames
considered, as on a Unix filesystem, simply a nul byte terminated
string of bytes, without encoding---so a not utf-8 valid string is no
problem for a pathname?

6) The parameters are global, meaning that different shares destined
to different users impose de facto utf-8 on the Unix side in order to
be able to store whatever the clients are sending---even if when a
file is retrieved, the reverse conversion from utf-8 to whatever is
done by Samba?

TIA for any information about these points.
-- 
        Thierry Laronde <tlaronde +AT+ kergis +dot+ com>
                     http://www.kergis.com/
                    http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



More information about the samba mailing list