checksum-xattr.diff [CVS update: rsync/patches]

Mon Jul 2 02:35:57 GMT 2007

On Sat, Jun 30, 2007 at 04:17:29PM -0400, Matt McCutchen wrote:
> First, setting the xattr hits the file's ctime.

Yeah, I realize that, and that's why none of the xattr values cache the
ctime.  This does mean that this method isn't good for updating checksum
values on existing files (since a general-purpose trusting/updating of
checksums based on size and mtime would be no better than a non-checksum
quick check).  It is still useful for allowing a server to cache the
checksum values without requiring any extra files.  As long as it is
used on files that aren't being actively updated, it works great.  I
might make this patch capable of creating the cached checksum values
when rsync creates a file, but I don't plan to make rsync ever update
an xattr checksum on an existing file.

> Second, it is impossible to make xattr-based checksum caching
> foolproof against same-second modification.

Not really.  The git algorithm only works if nothing modifies the files
while the checksum operation is running.  So, the algorithm protects
against bad things for sequential operations, but not parallel
operations.  A paranoid checksummer could notice if the mtime of a file
was "now"(*) and delay checksumming that file until later in the run.
It could also compare the mtime of a file from before and after it was
read to ensure that it wasn't modified during the read phase (assuming
that it never starts to read a file with an mtime of "now").

*Note that "now" for a particular disk may not be the same as time() if
the disk is remote, so network filesystems can be rather complicated.
Also, being off by a second might still be "now" if the value of the
seconds field rolled over during the check.  The perl script in my patch
that creates/updates these xattr checksums doesn't try to deal with any
of these complications.

..wayne..