checksum feature request

Bill Wichser bill at princeton.edu
Tue Oct 1 14:46:30 UTC 2019


Back in the spring, we started using rsync for a disk to disk backup 
system maintaining close to 10PB of data.  I am not here to debate the 
issue of what is the right tool but only to discuss what we found to be 
a problem with rsync when doing so.

We traced the various processes hoping to find what the culprit was 
slowing things down so much and determined pretty easily that it was the 
checksum components in rsync.  Once we found that and tested against the 
--checksum option, it was glaring that this was slowing us down.

We next tested the MD5 vs MD4 checksums and found little difference in 
speed.  So we went out in search of a better checksum algorithm and 
found xxhash, using the one from the Centos release.

Thanks to the way the source is written, it was a fairly easy patch to 
get this into the src RPM.  We have been using this in production now 
for awhile and see about a 3x speedup over the MD5/4 checksum algorithm 
which brings it pretty close to the --checksum speed.

Attached is the patch we applied.  Since xxhash is in the distro, a 
dependency would be required for this RPM.  If nothing else, perhaps the 
developers should just take a look as this could benefit many.

Thanks,
Bill
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xxhash.patch
Type: text/x-patch
Size: 4030 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/rsync/attachments/20191001/9edcb332/xxhash.bin>


More information about the rsync mailing list