checksum feature request
Bill Wichser
bill at princeton.edu
Tue Oct 1 14:46:30 UTC 2019
Back in the spring, we started using rsync for a disk to disk backup
system maintaining close to 10PB of data. I am not here to debate the
issue of what is the right tool but only to discuss what we found to be
a problem with rsync when doing so.
We traced the various processes hoping to find what the culprit was
slowing things down so much and determined pretty easily that it was the
checksum components in rsync. Once we found that and tested against the
--checksum option, it was glaring that this was slowing us down.
We next tested the MD5 vs MD4 checksums and found little difference in
speed. So we went out in search of a better checksum algorithm and
found xxhash, using the one from the Centos release.
Thanks to the way the source is written, it was a fairly easy patch to
get this into the src RPM. We have been using this in production now
for awhile and see about a 3x speedup over the MD5/4 checksum algorithm
which brings it pretty close to the --checksum speed.
Attached is the patch we applied. Since xxhash is in the distro, a
dependency would be required for this RPM. If nothing else, perhaps the
developers should just take a look as this could benefit many.
Thanks,
Bill
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xxhash.patch
Type: text/x-patch
Size: 4030 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/rsync/attachments/20191001/9edcb332/xxhash.bin>
More information about the rsync
mailing list