[Samba] Character encoding mystery

Emmanuel Florac eflorac at intellique.com
Thu Apr 26 17:29:41 UTC 2018


Hi everyone,

I have a very annoying character encoding problem. Have a look to this:

#  ls -l M*mo-1.*
-rw-rw-rw- 1 root root 8417218  6 sept.  2013 Mémo-1.aif
-rwxr--r-- 1 hope hope 8417218  6 sept.  2013 Mémo-1.aif
-rw-rw-rw- 1 root root  363175  6 sept.  2013 Mémo-1.m4a
-rwxr--r-- 1 hope hope  363175  6 sept.  2013 Mémo-1.m4a

Yes, it looks like two files have exactly the same name, but actually
they're different: one as "é" encoded as 0xCC81, and the other one (the
"good one") as 0xC3A9. Of course similar problems occur for all accented
letters.

So here's the setup: I have a very weird proprietary system (DDP
server), probably running internally some ancient version of Samba.
People copied these files to this old server from Mac workstations. So
far so good.

I have a new server, running CentOS 7.3 and Samba 4.6. I mounted the
CIFS exports from the DDP server :

# mount | grep temp

//192.168.5.150/w-rushes-temp on /mnt/w-rushes-temp type cifs
(ro,relatime,vers=1.0,cache=strict,username=admin,domain=,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.5.150,soft,unix,posixpaths,serverino,mapposix,acl,rsize=1048576,wsize=65536,echo_interval=60,actimeo=1)

Listing the files on this mount everything looks good at first glance:

# ls -l M*mo-1.*

-rw-rw-rw- 1 root root 8417218  6 sept.  2013 Mémo-1.aif
-rw-rw-rw- 1 root root  363175  6 sept.  2013 Mémo-1.m4a

Now I copy the files from the old system to the new one, using cp -a,
or rsync.

Then when connecting with the Mac to the new server using SMB, you
can't see any of the files with accented characters in the name. But
they're here, though invisible from the Mac Finder (they look fine when
listed from the terminal, as you've seen before).

If I copy the file from the Mac Finder, or I create a new file with
"touch héhohàhù" they appear perfectly fine, with accents and all.

What can be the cause of this weird encoding effect? 
You notice that on the new server I didn't use "iocharset=utf8" option.
However the files with accented characters look fine (treacherously).

Bonus question, I have 327 TB of data with mangled file names. Any trick
to avoid copying everything *again* would be welcome...



-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac at intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Signature digitale OpenPGP
URL: <http://lists.samba.org/pipermail/samba/attachments/20180426/02253592/attachment.sig>


More information about the samba mailing list