Filename encodings

David Ayers d.ayers at inode.at
Fri Jul 29 20:36:32 GMT 2005


Wayne Davison wrote:
> 
> I've commented before that I don't like this solution, so this will
> never go onto the trunk.

I admit that only searched the bug database and the web site in general
and failed to properly research the mailing list.  I also missed bug
2790 until  searched the mailing lists more thoroughly now.  I've
checked up a bit (see below) but if you have a pointer to a specific
thread where you explain what you requirements are, that would be really
great.

>  If someone would like to work on a solution
> that uses iconv library calls, I'd be glad to consider including that.

No promises, but I think I'll take a stab at it as time permits.

> Complexities including adding a option(s) to indicate the encoding to
> be used on the client and the server side,

I'm unfamiliar with the code and further more I would also consider
myself a novice rsync user so please be gentle... But did you really
mean to use the terms "client/server" or rather "local/remote" or maybe
"source/destination"?

I would have expected local/remote or source/destination.

> using UTF-8 for the names
> sent over the wire,

OK... I'm not in the code yet but I suppose I could convert from the
specified encoding to UTF-8 just before sending it over the wire and
convert back to the specified encoding upon reception.  The internal
representation wouldn't change.  This seems like the least disruptive.

Still it isn't clear how failure to encode/decode names should be
handled.  (i.e. the file system will allow file names with differing
encodings (well in theory even a single file name could have an
arbitrary mixture of encodings)).  Should those files be rsynced "as is"
(i.e. without transformation) or should rsync error out?

I would lean towards a warning and transfer the names without
transformation.  But this could lead to issues if one system simply did
not have the requested locale installed and files existed that actually
matched the "as is" representation... but I guess that could be stated
as either the expected or "undefined" behavior.

> and always outputting names in messages using the
> client encoding.

Well, I can look into this also, but have a feeling it's not really part
of the issue itself.  I hope you would accept this in stages.

>  Also, there would be no need to increment the protocol
> version since an older rsync would not understand the conversion
> option(s) that the client would send to it.
> 

That sounds good, but I will still need to familiarize myself with the
protocol.

Let's see how far I get.

So let me summarize:
- allow encoding conversions as provided by libiconv interface
- have the the encodings specified on the command line
- send the file names as UTF-8 over the wire

- convert and transfer error messages as UTF-8 over the wire if
encodings were specified


Cheers,
David Ayers

PS: These are the threads I found:

http://www.mail-archive.com/rsync@lists.samba.org/msg10555.html
http://www.mail-archive.com/rsync@lists.samba.org/msg11486.html
http://www.mail-archive.com/rsync@lists.samba.org/msg13122.html
http://www.mail-archive.com/rsync@lists.samba.org/msg13451.html
http://www.mail-archive.com/rsync@lists.samba.org/msg13667.html



More information about the rsync mailing list