Caching {filePath,mtime64,checksum} values to speed up execution-time

Doug Robinson doug.robinson at wandisco.com
Tue Mar 11 16:11:24 MDT 2014


Folks:

When using rsync to copy huge amounts of data I've found that a significant
amount of time is spent computing the checksums.  Sometimes hours, ...
sometimes days - it depends on the total amount of data checked!  And after
that sometimes it's only a few files that need to be updated.

I've pulled the latest git (rsync-3.1.1pre1) and didn't see anything to
address this (or I missed it?).

I was wondering what folks thought of a proposal to enhance rsync to be
able to create and maintain a cache of {filePath, 64-bit mtime, checksum}
beforehand on both source and target systems and then use that cache later
on when asked to sync the two systems together?  Then cache entry
validation would be a quick stat64() to make sure that the 64-bit mtime
didn't change before sending the checksum over the wire for comparison.

Clearly the cache would need to be completely invalidated (or re-created)
if the file system became corrupt.  That could be handled via an "rm -rf"
of the cache.

Thoughts?

Thank you.

Doug
--
WANdisco // *Non-Stop Data*

t. 925-396-1125
e. doug.robinson at wandisco.com

-- 


Join us in New York and San Francisco for Subversion & Git Live 2014<http://www.wandisco.com/subversion-git-live-2014>

Listed on the London Stock Exchange: WAND<http://www.bloomberg.com/quote/WAND:LN>

THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY BE 
PRIVILEGED.  If this message was misdirected, WANdisco, Inc. and its 
subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. 
 If you are not the intended recipient, please notify us immediately and 
destroy the message without disclosing its contents to anyone.  Any 
distribution, use or copying of this e-mail or the information it contains 
by other than an intended recipient is unauthorized.  The views and 
opinions expressed in this e-mail message are the author's own and may not 
reflect the views and opinions of WANdisco, unless the author is authorized 
by WANdisco to express such views or opinions on its behalf.  All email 
sent to or from this address is subject to electronic storage and review by 
WANdisco.  Although WANdisco operates anti-virus programs, it does not 
accept responsibility for any damage whatsoever caused by viruses being 
passed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20140311/7e8def10/attachment.html>


More information about the rsync mailing list