Need for a partial checksums patch?

Wayne Davison wayned at samba.org
Wed Dec 28 14:56:51 MST 2011


On Wed, Dec 28, 2011 at 4:04 AM, Simo Melenius <simo.melenius at iki.fi> wrote:

> However, checksumming big files (even dozens of gigabytes) takes time.
> Now, I observed that my files never really change only little and in only
> some parts. Also, undetected corruption is not an issue here: I can survive
> that by other means.
>

Check out the various checksum* and db patches in the patches distribution.
 They provide a way to cache the checksum for files that haven't changed.
 They work based on the idiom that the ctime for a file will change even if
the mtime gets set back to an older value.  When the ctime changes, rsync
recomputes the checksum.  For all other files, the cached checksum suffices.

Ignore the checksum-xattr.diff patch, as that just provides a way for a
server/mirror host to cache the checksums for files -- it doesn't provide a
safe way to detect when a new checksum is needed.

The checksum-updating.diff patch is a reasonable solution, as long as you
don't mind a bunch of .rsyncsums files getting sprinkled about (it requires
the checksum-reading.diff patch).

Finally, the db.diff patch stores its checksums in a database.  It supports
MySQL and SQLite3 (though the write speed on the latter needs to be
improved).  The db patch doesn't currently handle expiring orphaned
checksum entries from the db yet, though.

As for the sparse checksumming, feel free to send me a patch -- I'll
consider putting it into the patches release.

..wayne..
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20111228/f5d08af7/attachment.html>


More information about the rsync mailing list