Possibility to run rsync without hash table lookups

Michael Lynch michaellynch511 at gmail.com
Wed Jul 20 00:40:05 MDT 2011


Hi All

I am using rsync to do a local network copy of 10 ~8gig files.

The source is a NAS Atom rsync server, and the destination is a cygdrive,
obviously on the same computer that is running rsync client.

I am using --inplace, and ingeneral, the 8gig files generally have data
changed within the file, but the general structure of the file is not
changed. They are Firebird databases. (The total daily binary difference,
that which is sent over the wire, is usually about 300meg for 8gig file)

I am looking at ways to improve the performance.

What I have noticed is the following:
1) RSync checks that the file needs updating
2) It runs through the file on the client (8gig read on the client, quite
quick)
3) Client and server now start running through the file. But this is slow,
CPU limits this to 10megs/s

Presumably, the CPU is getting thrashed because it is performing hash table
lookups.

Is it possible to disable hash table look ups, but still have the MD5 block
comparison?
So, in step 3, both the client and server run through the file, and send
each other MD5 block comparisons, and simply transfer the whole block if
needed, instead of doing a hash table lookup?

I have tried --append and --append-verify, but my data is not strictly
append-only data
When I used --append-verify, it confirmed that the Atom server it quite
capable of chewing through a large file and generating checksums.

If I am correct, I think it would be a nice feature for RSync, to just do
block comparison, combined with the '--inplace' feature.
Fantastic stuff.

Cheers
Michael.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20110720/3aefc269/attachment.html>


More information about the rsync mailing list