Filename character translation

Rainer Zocholl UseNet-Posting-02141- at zocki.toppoint.de
Sat Dec 7 15:39:00 EST 2002


savin.gorup at asist-traffic.com(Savin Gorup)  07.12.02 11:54 wrote:


>I came across the problem with rsync-2.5.5 on Cygwin/Win2K while
>rsyncing with filenames which have 'strange' (non latin-1) characters
>in filenames. The problem is that filenames on Windows system are
>coded (in our case) in codepage 852, while server (Linux system) has
>filename coding according to ISO-8859-2. This two are not fully
>compatible, causing rsync to simply skip copying some files (and whole
>directories!) to server.

>Samba solves this kind of problem by using 'client code page' and
>'character set' options. 

but rsync does not work on such "funny" chars in samba dirs either!
At least 2.5.5 on SCO OSR5 failed. Thought it was a SCO problem
(rdist did work either) so i went back to cpio to make 
remote backup work. Maybe i can find the error messages somewhere. 
IIRC rsync simply stops working on the first file with an Umlaut 
("U" (0x9A)) and continues with the next directory...
If someone is not exactly comparing the results -every time- 
he would not become aware the problem: One (samba) user might create 
such a filename meanwhile, and since that day only the half directory 
is backed up...



>I propose somewhat simpler solution using
>translation table between local and remote file system.

>I have developed a patch to address the problem, 
>which basically does this: 
>- adds command line option --filename-translation (options.c)
>- builds two way character translation lookup table in memory (512 bytes) 
>(utils.c)

>- translates filenames at appropriate places (sender.c, flist.c)
>is --filename-translation is present

>Note this patch can't handle multibyte encodings. 

That's a problem:

The "normal" NT "_findfirst" translates all(!) unicodes > 0xff 
to 0x3f "?" AFAIK.
On the Unix box the "?" (=wildcard!) in the file name gives no problem.
But "restore" will be impossible, because "?" is no legal
character on NTFS/FAT... 
Too your "mapping" fails, because all unicode chars are already 
mapped when rsync see it (if not _tfindfirst is used!).
But:
i don't know what the cygwin-API is doing.
Maybe it does better than NT?



>There has been some interest in that topic before here
>(http://www.mail-archive.com/rsync@lists.samba.org/msg03306.html) and
>also on some other, local mailing lists. Since inability to copy all
>files renders rsync unusable to non-latin-1 users 

Yepp. Was very disappointed about that, but had have no time to work 
on the problem..

>I would like to hear some comments about including the patch 
>into main source tree (or proposing a better solution, of course).

I would be happy if rsync would be able to copy samba shares 
between unixes...

Wasn't that problem already been solved for CD filesystems?
(Rockridge extensions?)



Thanks for bringing the problem to the list!





More information about the rsync mailing list