Caching {filePath,mtime64,checksum} values to speed up execution-time

Doug Robinson doug.robinson at wandisco.com
Fri Mar 14 13:06:42 MDT 2014


Wayne:

Thank you for responding.

On Thu, Mar 13, 2014 at 8:37 PM, Wayne Davison <wayned at samba.org> wrote:

> On Tue, Mar 11, 2014 at 3:11 PM, Doug Robinson <doug.robinson at wandisco.com
> > wrote:
>
>> I was wondering what folks thought of a proposal to enhance rsync to be
>> able to create and maintain a cache of {filePath, 64-bit mtime, checksum}
>> beforehand on both source and target systems and then use that cache later
>> on when asked to sync the two systems together?
>
>
> See patches (in order of recommendation): db.diff, checksum-updating.diff,
> checksum-xattr.diff.
>

Will do.


> I personally use db.diff in one situation at work combined with a sqlite
> DB on the source and destination machines.  You just need to periodically
> weed out any old inode values (via rsyncdb --clean /dirs) if things start
> to bloat.  In the future I'd like to see the db.diff code included by
> default as loadable libraries, which would allow someone to install plain
> rsync and only also install sqlite-using rsync and/or mysql-using modules
> if they want the extra functionality.  There is also a plan to eventually
> have the db code map the inodes in the db to paths for things like rename
> optimizations.
>

Nice.


> That said, all these patches currently do is cache checksums.  The db
> patch's default strict checking only uses a cached inode's info if the
> size+mtime+ctime all match what we knew about the file when it was cached
> (which makes it pretty safe).
>

Seems fine unless xattrs are in play.  I saw your comment in a prior
posting abut ctime - and it makes a lot of sense.  Thank you.


>  If you switch to a more lax algorithm (no ctime) you need to be extra
> sure the files don't get updated in some way as to leave the file matching
> the laxer inode info (e.g. only let rsync make changes to the files and/or
> make sure that modify timestamps always increase so that there is no chance
> of accidentally matching an older inode record).
>

We're not dealing with xattr so I plan on using the size+mtime+ctime match.


> If you're wondering how an mtime-using algorithm helps your use case, keep
> in mind that the mtimes don't need to match between hosts, just between
> each host's files and its db cache (and any non-matching or missing ones
> get (re)computed to the new checksum).
>

That was the way that I understood it - glad to read I'm on the right path.


> I'll also point out that if you want to use sqlite, I recommend you use
> the very latest db.diff (from the git patches repo) since it has a change
> that alleviates locking contention between the multiple rsync processes in
> a single copy (you can't really share the db between simultaneous rsync
> copies due to sqlite's poor multi-process locking -- use mysql for that).
>

I'll have to consider my use case(s) to determine the right of sqlite vs.
mysql.  Thank you for the head's up.


> The rsyncdb manpage has info on initializing the db, noting mounts,
> maintenance, etc.
>
> The other patches might also be useful to you, so feel free to check them
> out:  https://git.samba.org/?p=rsync-patches.git
>

Excellent.  Thank you.

Doug
-- 
Doug Robinson

WANdisco // *Non-Stop Data*

t. 925-396-1125
e. doug.robinson at wandisco.com

-- 


Join us in New York and San Francisco for Subversion & Git Live 2014<http://www.wandisco.com/subversion-git-live-2014>

Listed on the London Stock Exchange: WAND<http://www.bloomberg.com/quote/WAND:LN>

THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY BE 
PRIVILEGED.  If this message was misdirected, WANdisco, Inc. and its 
subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. 
 If you are not the intended recipient, please notify us immediately and 
destroy the message without disclosing its contents to anyone.  Any 
distribution, use or copying of this e-mail or the information it contains 
by other than an intended recipient is unauthorized.  The views and 
opinions expressed in this e-mail message are the author's own and may not 
reflect the views and opinions of WANdisco, unless the author is authorized 
by WANdisco to express such views or opinions on its behalf.  All email 
sent to or from this address is subject to electronic storage and review by 
WANdisco.  Although WANdisco operates anti-virus programs, it does not 
accept responsibility for any damage whatsoever caused by viruses being 
passed.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20140314/db2052c8/attachment.html>


More information about the rsync mailing list