Rsync: Re: patch to enable faster mirroring of large filesyst ems

Dave Dykstra dwd at bell-labs.com
Fri Nov 30 05:52:29 EST 2001


Were you using the -c option of rsync?  It sounds like you were and it's
extremely slow.  I knew somebody who once went to extraordinary lengths to
avoid the overhead of -c, making a big patch to rsync to cache checksums,
when all he had to do was not use -c.

- Dave Dykstra

On Thu, Nov 29, 2001 at 12:41:46PM -0600, Keating, Tim wrote:
> I, too, was disappointed with rsync's performance when no changes were
> required (23 minutes to verify that a system of about 3200 files was
> identical). I wrote a little client/server python app which does the
> verification, and then hands rsync the list of files to update. This reduced
> the optimal case compare time to under 30 seconds. Here's what it does, and
> forgive me if these sound similar to the stuff you're doing:
> 
>  - The client and server cache checksums (MD5, since there is no MD4
> implementation conveniently available for Python that I know of) on a
> per-directory basis. These are kept in a .checksum file in the directory, so
> they persist from session to session. This is especially handy for the
> server, where (in my particular case) the files don't change very often.
> 
>  - On the initial compare the client sends the checksum of each .checksum
> file; if they match, it's not necessary to send the .checksum file, and we
> just culled an entire directory for a cost of about a 32 byte transfer.
> 
>  - If there's a mismatch, the client sends over the entire .checksum file.
> The server does the compare and sends back a list of files to delete and a
> list of files to update. (And now I think of it, it would probably be better
> if the server just sent the client back the list of files and let the client
> figure out what it needed, since this would distribute the work better.)
> 
>  - The client deletes the delete files, and uses rsync to update the update
> files.
> 
> The ideal case is when all checksums are up to date. The worst-case is when
> the checksum cache needs to be built completely -- but this still only takes
> a couple of minutes, easily an order of magnitude better than the best-case
> I experienced with raw rsync.
> 
> > -----Original Message-----
> > From: Alberto Accomazzi [mailto:aaccomazzi at cfa.harvard.edu]
> > Sent: Thursday, November 29, 2001 10:02 AM
> > To: Dave Dykstra
> > Cc: rsync at samba.org
> > Subject: Re: Rsync: Re: patch to enable faster mirroring of large
> > filesystems 




More information about the rsync mailing list