[Samba] Languages and encoding: file system and file contents

Michael Tokarev mjt at tls.msk.ru
Mon Feb 5 13:16:27 UTC 2024


05.02.2024 15:18, Thierry LARONDE via samba :
> I'm rather unclear about the way CIFS/Samba deal with languages,
> encoding, and the, perhaps, encoding of pathnames vs the encoding of
> files considered by MS Windows to be text file and hence with, perhaps
> a language and an encoding.
> 
> So I will try to formulate questions about the elementary points, and
> I will be grateful to the ones who can share lights about these:
> 
> "dos charset" and "unix charset" are global parameters. As far as I
> understand the description, this:
> 	a) fixes the _contents_ of files;

Absolutely not. Samba does not do anything with contents of the files,
it treats all files as binary objects, not changing contents in any
way.

> 	b) the parameters are global so it is not possible to have
> different values for different shares and/or different users.
> 
> Context: I imagine a Unix filesystem served to Windows clients.
> 
> So the questions:
> 
> 1) These parameters don't seem to have anything to do with pathnames,
> but setting "unix charset" to ASCII, not ascii pathnames are not
> displayed on the clients. What is the relationship between charsets
> and the pathnames?

These parameters has meaning for file *names* (pathnames) *only*,
has nothing to do with the contents of the files.

> 2) What is the relationship between the language parameters (LC_* and
> LANG) settings for the user, and the "charsets" defined in smb.conf? Is
> the localisation encoding, for a Unix user mapped to a Samba user, used
> in anyway to inform a Windows client about the language and encoding
> of the contents?

There's no relation whatsoever.

> 3) If a MS Windows client connects to a share via another user (say a
> Unix one), if the encodings on the MS Windows is different from what
> is defined for the user connecting, is there a problem? (Ex.: Windows
> is configured to use latin9 or equivalent; user used to connect is
> declared as using UTF-8; what encoding will be used by a Windows
> program? latin9 or UTF-8---I'm not talking about what will be stored,
> I'm talking about what the Windows program, on Windows, is using:
> Windows user encoding or encoding of the user making the share
> connection?

When two entities connect, they exchange information about the charsets
they store filenames in.  Next it's the client job to convert from/to
server charset to/from whatever local charset happens to be.

> 4) Same question about the pathnames?

It is about pathnames *only*, nothing to do with file contents.

/mjt



More information about the samba mailing list