Rsync: Re: patch to enable faster mirroring of large filesyst ems

Keating, Tim TKeating at origin.ea.com
Fri Nov 30 05:41:46 EST 2001


I, too, was disappointed with rsync's performance when no changes were
required (23 minutes to verify that a system of about 3200 files was
identical). I wrote a little client/server python app which does the
verification, and then hands rsync the list of files to update. This reduced
the optimal case compare time to under 30 seconds. Here's what it does, and
forgive me if these sound similar to the stuff you're doing:

 - The client and server cache checksums (MD5, since there is no MD4
implementation conveniently available for Python that I know of) on a
per-directory basis. These are kept in a .checksum file in the directory, so
they persist from session to session. This is especially handy for the
server, where (in my particular case) the files don't change very often.

 - On the initial compare the client sends the checksum of each .checksum
file; if they match, it's not necessary to send the .checksum file, and we
just culled an entire directory for a cost of about a 32 byte transfer.

 - If there's a mismatch, the client sends over the entire .checksum file.
The server does the compare and sends back a list of files to delete and a
list of files to update. (And now I think of it, it would probably be better
if the server just sent the client back the list of files and let the client
figure out what it needed, since this would distribute the work better.)

 - The client deletes the delete files, and uses rsync to update the update
files.

The ideal case is when all checksums are up to date. The worst-case is when
the checksum cache needs to be built completely -- but this still only takes
a couple of minutes, easily an order of magnitude better than the best-case
I experienced with raw rsync.

> -----Original Message-----
> From: Alberto Accomazzi [mailto:aaccomazzi at cfa.harvard.edu]
> Sent: Thursday, November 29, 2001 10:02 AM
> To: Dave Dykstra
> Cc: rsync at samba.org
> Subject: Re: Rsync: Re: patch to enable faster mirroring of large
> filesystems 




More information about the rsync mailing list