proposal to speed rsync with lots of files

Fabian Cenedese Cenedese at indel.ch
Mon Mar 9 11:20:17 GMT 2009


At 07:58 06.03.2009 -0800, Wayne Davison wrote:
>On Thu, Mar 05, 2009 at 03:27:50PM -0800, Peter Salameh wrote:
>> My proposal is to first send a checksum of the file list for each
>> directory.  If is found to be identical to the same checksum on the
>> remote side then the list need not be sent for that directory!
>
>My rZync source does something like that for directories:  it treats a
>directory-list transfer like a file transfer.  That means that the
>receiving side sends a set of checksums to the sending side telling it
>what it's version of the directory looks like, and then the sender sends
>a normal set of delta data that lets the receiver reconstruct the
>sender's version of the directory (which it compares to its own).  One
>potential drawback is having to deal with false checksum-matches (which
>should be rare, but would require the dir data to be resent) I hadn't
>optimized it for block size or (possibly) data order to make it more
>efficient, but it is an interesting idea for speeding up a slow
>connection.  I'm not sure if it would really help out that much for a
>more modern, faster connection, because rsync sends the file-list data
>at the same time as it is being scanned, and sometimes the scan is the
>bottle-neck.

To find out whether the scanning or the transferring is the bottleneck,
would it be possible to give in the statistics a hint like what threads
needed to wait longer, what action took more time? Something that
would give a hint that e.g. enabling/disabling compression might give
a faster overall transfer. I don't know if this internal data can be collected
or if the "trial-and-change" method is the only way to do it.

Thanks

bye  Fabi



More information about the rsync mailing list