New rsync option checksum-path

Wayne Davison wayned at samba.org
Tue Mar 11 00:41:17 GMT 2008


On Mon, Mar 10, 2008 at 01:37:25PM -0300, Ruy Exel wrote:
> Suspecting that the changes were made only to ID3 tags, a very common
> situation, one could write a shell script, say SHORTSUM, which would
> calculate a checksum only on the first 10K bytes, for example, where
> ID3 tags reside.

One thing you can do is to use the existing patch, checksum-reading,
which allows the exchange of pre-computed checksum information that
is matched against a more strict set of stat values than the normal
"quick check" (i.e. it includes ctime and inode).  You could modify
the included perl script (support/rsyncsums) so that it generates a
checksum using an alternate means than the normal full-file checksum
(which is cached as an MD4 and an MD5 checksum for both old and new
protocols -- so just change both values to the same alternate sum).
As long as you regularly run the rsyncsums script on both systems
(and make sure that both the sender and the receiver rsync get the
--sumfiles=strict option), then rsync will figure out which files
have really changed without having to read all the files.  (Even the
rsyncsums script run is efficient, as it just updates files where
the mtime/ctime/size/inode have changed since the last cache.)

Note that the above solution doesn't require that you alter the
checksum method in order to make things fast, so you could just
leave it alone if you are willing to make regular runs of the
rsyncsums script.

You do need to be careful not to use the rsyncsums script on files
that are being actively written, as there is a small window where a
race condition can bite you.  To trigger it, rsync needs to stat a
file with the same mtime and ctime as the current time, read the
file, re-stat the file and discover that the stat info is unchanged
(including no change in size), and also have a concurrent writer
that changes the file's data during the period in between when rsync
reads the file but before the end of the current second, with no
further changes being made to the file after the current second
ends.  The rsyncsums script could be changed to not cache the
checksum of a file which has a "now" ctime value, which would
avoid this, but it does not do that at the moment.

..wayne..


More information about the rsync mailing list