checksum-xattr.diff [CVS update: rsync/patches]

Matt McCutchen hashproduct+rsync at gmail.com
Mon Jul 2 12:43:39 GMT 2007


On 7/1/07, Wayne Davison <wayned at samba.org> wrote:
> [...]  It is still useful for allowing a server to cache the
> checksum values without requiring any extra files.  As long as it is
> used on files that aren't being actively updated, it works great.

OK, that's reasonable.

> > Second, it is impossible to make xattr-based checksum caching
> > foolproof against same-second modification.
>
> Not really.

What do you mean?  There's no way to fix the example I gave with
xattrs, whereas...

> The git algorithm only works if nothing modifies the files
> while the checksum operation is running.  So, the algorithm protects
> against bad things for sequential operations, but not parallel
> operations.

...I proposed a small change to the git algorithm that makes it
protect against parallel operations too:

http://marc.info/?l=git&m=118323680215966&w=2

> A paranoid checksummer could notice if the mtime of a file
> was "now"(*) and delay checksumming that file until later in the run.

That would be especially smart.  Git doesn't attempt to save reusable
checksums for files whose mtimes are "now".

> It could also compare the mtime of a file from before and after it was
> read to ensure that it wasn't modified during the read phase (assuming
> that it never starts to read a file with an mtime of "now").

Or it could just use the "before" mtime in the cache so that, if the
file is modified during reading, the cached checksum would already be
invalid.  I think git does this.

> *Note that "now" for a particular disk may not be the same as time() if
> the disk is remote, so network filesystems can be rather complicated.

That's easy to fix: get your "now" by touching a file on the
filesystem and reading the resulting mtime.

> Also, being off by a second might still be "now" if the value of the
> seconds field rolled over during the check.

I don't think this is a problem if you stat the file just once before
reading it.

> The perl script in my patch
> that creates/updates these xattr checksums doesn't try to deal with any
> of these complications.

And that's probably fine for rsync's purposes.  However, I still think
it might be cool if I made a foolproof checksum-caching library and
rsync used it...

Matt


More information about the rsync mailing list