Factor out .rsyncsums logic into a separate checksum-caching library?

Sat Jun 30 19:47:04 GMT 2007

On 6/30/07, Wayne Davison <wayned at samba.org> wrote:
> On Sun, Jun 24, 2007 at 01:03:03PM -0400, Matt McCutchen wrote:
> > Specifically, it has protection against being fooled when a file's
> > checksum is cached and the file is modified again in the same second;
> > .rsyncsums could use this.
>
> I tried to find a description for this algorithm, but didn't see it
> mentioned in any of the web searches I made.  Is the algorithm described
> anywhere?  Or is my only choice to dig into the source and try to find
> it?

Try here:

http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/technical/racy-git.txt;hb=HEAD

> > The git index has been heavily used and tested, so you might find it
> > helpful when implementing a checksum cache for rsync.
>
> The problem with this is that the git cache is SHA1, and rsync needs
> both MD4 and MD5, depending on what protocol version is in effect.
> It should be possible to adapt their code for rsync's purpose, but it's
> probably overkill.  The idea behind the new checksum patch is mainly to
> allow servers to provide cached checksums for their files, especially
> servers whose content is slow to change.

I didn't necessarily think you should reuse any code from the git
cache, just ideas.  You're already storing mtimes and ctimes; it
appears to me that the only relevant things that git does and you
don't are store the size and i-number and protect against same-second
modification.  Maybe adding those is overkill for rsync's purposes.
But if I wrote a library implementing a completely foolproof checksum
cache that could be used with MD4, MD5, or any other checksum
algorithm, would you be likely to adopt it for rsync?

Matt