proposal to speed rsync with lots of files

Wayne Davison wayned at samba.org
Fri Mar 6 15:58:49 GMT 2009


On Thu, Mar 05, 2009 at 03:27:50PM -0800, Peter Salameh wrote:
> My proposal is to first send a checksum of the file list for each
> directory.  If is found to be identical to the same checksum on the
> remote side then the list need not be sent for that directory!

My rZync source does something like that for directories:  it treats a
directory-list transfer like a file transfer.  That means that the
receiving side sends a set of checksums to the sending side telling it
what it's version of the directory looks like, and then the sender sends
a normal set of delta data that lets the receiver reconstruct the
sender's version of the directory (which it compares to its own).  One
potential drawback is having to deal with false checksum-matches (which
should be rare, but would require the dir data to be resent) I hadn't
optimized it for block size or (possibly) data order to make it more
efficient, but it is an interesting idea for speeding up a slow
connection.  I'm not sure if it would really help out that much for a
more modern, faster connection, because rsync sends the file-list data
at the same time as it is being scanned, and sometimes the scan is the
bottle-neck.

The best way to optimize sending of really large numbers of files that
are mostly the same is to start to leverage a file-change notification
system, such as inotify.  Using that, it is possible to distill a list
of what files/directories need to be copied, and to just copy what is
needed.

..wayne..


More information about the rsync mailing list