DO NOT REPLY [Bug 2790] Add support for converting filenames into different encodings

samba-bugs at samba.org samba-bugs at samba.org
Tue Oct 30 22:15:03 GMT 2007


https://bugzilla.samba.org/show_bug.cgi?id=2790





------- Comment #10 from matt at mattmccutchen.net  2007-10-30 17:15 CST -------
(In reply to comment #9)
> The current solution appears to be somewhat confused about what it is trying to
> solve.

Rather, you appear to be overcomplicating the problem.

> There are three filename encodings: the one in the client fs, the transfer
> encoding, the one in the server fs.
> Client needs to know client-fs and transfer, server needs to knoe server-fs and
> transfer.
> Trying to mush up any two of the three leads to pain.

Rsync isn't like MySQL, which tags every string value with its encoding, and I
don't see why we would want to make it that way.  Instead, the rsync sender and
receiver each treat filenames as plain sequences of bytes, in accordance with
the POSIX filesystem API on which rsync relies so heavily.  --iconv merely
allows you to make the sender and receiver byte sequences differ by an encoding
conversion because this is often useful.

> -- compatible: The server may not know about iconv.  So the client has to do
> all the conversions.  This is almost support now, except that the client sends
> an iconv option to the server that this does not understand.

This is the only thing you propose that rsync does not already support, and I
think it is a natural addition to rsync.  Currently, if iconv is enabled, each
process converts strings from its local encoding to UTF-8 before sending them
over the wire and converts strings from UTF-8 to its local encoding after
reading them from the wire.  Rsync should let the user specify another encoding
in place of UTF-8.

Specifically, I propose two options to specify the conversion, if any, to be
applied on each end: --iconv-client=CLIENT,WIRE and --iconv-server=WIRE,SERVER
.  (There's no reason rsync shouldn't allow the two values of WIRE to be
different, although this would rarely be useful.)  --iconv=CLIENT,SERVER then
stands for --iconv-client=CLIENT,UTF-8 --iconv-server=UTF-8,SERVER .  A
"compatible" copy with a UTF-8 client and an ISO-8859-1 server could be
achieved by --iconv-client=UTF-8,ISO-8859-1 .

> The only switch that needs a single-character form is --encoding-aware, which
> should get part of finger memory like -a for most rsync users.

I think --iconv=. or --encoding-aware is too special-purpose to "need" a
single-character form in the main version of rsync.  If you use it frequently,
you can always define your own popt alias.  This is what Wayne recommended for
my favorite "sane" option, --chmod=ugo=rwX .


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.


More information about the rsync mailing list