rsync checksums

Wayne Davison wayned at samba.org
Mon Apr 28 14:41:47 GMT 2008


On Sun, Apr 27, 2008 at 11:06:00PM +0200, Karl Kashofer wrote:
> rsync calculates a md4 checksum of every file transferred.  Would it
> be possible to store this checksum for future use, i.e. to recheck the
> files of each rsync snapshot at any later time ?

The "db.diff" patch in the latest source has the start of some checksum-
caching.  It stores checksums in a DB (currently either MySQL or SQLite)
and provides support for storing/fetching the checksums during the rsync
transfer and/or updating and checking of the checksums via a provided
perl script.  The data is kept by inode, so it is only computed once per
hard-linked file (though there is not currently any optimizing of the
checking to avoid testing the same hard-linked file more than once when
using the perl script to verify the checksums).

> What happens if --checksum finds CRC differences in a file which has
> the same size and modification time as the copy in the archive ? 

Rsync just updates the file.  This happens regularly with certain kinds
of files, such as MP3s where an editing program tweaked some tags but
reset the mtime, or the __db.* files inside the /var/lib/rpm dir (for
reasons unknown).  If you itemize the transfer (-i) with checksums (-c),
you can see any files where the checksum differed but the time did not
by having a program analyze the output.  The case that would be really
worrisome would be a file whose ctime had not changed, but whose
checksum was no longer correct.  Regular rsync can't help with that,
but the db patch has enough data to be able to check for such an error.

..wayne..


More information about the rsync mailing list