Caching {filePath,mtime64,checksum} values to speed up execution-time

Kevin Korb kmk at sanitarium.net
Tue Mar 11 16:18:30 MDT 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- --checksum should not be used during normal rsync operations.  It is
for special cases only.

Rsync can still have a lot of overhead getting the timestamps via
stat() but that can't really be helped.

I don't really understand how file mtimes would be cached.  How would
rsync know what mtimes don't match the cache without checking
stat()ing the files and then the job is already done so the cache
wouldn't accomplish anything.

On 03/11/2014 06:11 PM, Doug Robinson wrote:
> Folks:
> 
> When using rsync to copy huge amounts of data I've found that a 
> significant amount of time is spent computing the checksums.
> Sometimes hours, ... sometimes days - it depends on the total
> amount of data checked!  And after that sometimes it's only a few
> files that need to be updated.
> 
> I've pulled the latest git (rsync-3.1.1pre1) and didn't see
> anything to address this (or I missed it?).
> 
> I was wondering what folks thought of a proposal to enhance rsync
> to be able to create and maintain a cache of {filePath, 64-bit
> mtime, checksum} beforehand on both source and target systems and
> then use that cache later on when asked to sync the two systems
> together?  Then cache entry validation would be a quick stat64() to
> make sure that the 64-bit mtime didn't change before sending the
> checksum over the wire for comparison.
> 
> Clearly the cache would need to be completely invalidated (or 
> re-created) if the file system became corrupt.  That could be
> handled via an "rm -rf" of the cache.
> 
> Thoughts?
> 
> Thank you.
> 
> Doug -- WANdisco // /Non-Stop Data/
> 
> t. 925-396-1125 e. doug.robinson at wandisco.com
> <mailto:doug.robinson at wandisco.com>
> 
> Join us in New York and San Francisco for Subversion & Git Live
> 2014 <http://www.wandisco.com/subversion-git-live-2014>
> 
> Listed on the London Stock Exchange: WAND 
> <http://www.bloomberg.com/quote/WAND:LN>
> 
> THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND
> MAY BE PRIVILEGED.  If this message was misdirected, WANdisco, Inc.
> and its subsidiaries, ("WANdisco") does not waive any
> confidentiality or privilege.  If you are not the intended
> recipient, please notify us immediately and destroy the message
> without disclosing its contents to anyone.  Any distribution, use
> or copying of this e-mail or the information it contains by other
> than an intended recipient is unauthorized.  The views and opinions
> expressed in this e-mail message are the author's own and may not
> reflect the views and opinions of WANdisco, unless the author is
> authorized by WANdisco to express such views or opinions on its
> behalf.  All email sent to or from this address is subject to
> electronic storage and review by WANdisco.  Although WANdisco
> operates anti-virus programs, it does not accept responsibility for
> any damage whatsoever caused by viruses being passed.
> 
> 
> 

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
	Kevin Korb			Phone:    (407) 252-6853
	Systems Administrator		Internet:
	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
	Orlando, Florida		kmk at sanitarium.net (personal)
	Web page:			http://www.sanitarium.net/
	PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlMfi7YACgkQVKC1jlbQAQdhqwCgsJfz2NIqyYuPVD2vO1rrL0Hd
xjcAoLBwGIz+WHIySNVpmX4krMCWncwE
=iFCJ
-----END PGP SIGNATURE-----


More information about the rsync mailing list