Factor out .rsyncsums logic into a separate checksum-caching library?

Matt McCutchen hashproduct+rsync at gmail.com
Sat Jun 30 19:47:04 GMT 2007


On 6/30/07, Wayne Davison <wayned at samba.org> wrote:
> On Sun, Jun 24, 2007 at 01:03:03PM -0400, Matt McCutchen wrote:
> > Specifically, it has protection against being fooled when a file's
> > checksum is cached and the file is modified again in the same second;
> > .rsyncsums could use this.
>
> I tried to find a description for this algorithm, but didn't see it
> mentioned in any of the web searches I made.  Is the algorithm described
> anywhere?  Or is my only choice to dig into the source and try to find
> it?

Try here:

http://git.kernel.org/?p=git/git.git;a=blob;f=Documentation/technical/racy-git.txt;hb=HEAD

> > The git index has been heavily used and tested, so you might find it
> > helpful when implementing a checksum cache for rsync.
>
> The problem with this is that the git cache is SHA1, and rsync needs
> both MD4 and MD5, depending on what protocol version is in effect.
> It should be possible to adapt their code for rsync's purpose, but it's
> probably overkill.  The idea behind the new checksum patch is mainly to
> allow servers to provide cached checksums for their files, especially
> servers whose content is slow to change.

I didn't necessarily think you should reuse any code from the git
cache, just ideas.  You're already storing mtimes and ctimes; it
appears to me that the only relevant things that git does and you
don't are store the size and i-number and protect against same-second
modification.  Maybe adding those is overkill for rsync's purposes.
But if I wrote a library implementing a completely foolproof checksum
cache that could be used with MD4, MD5, or any other checksum
algorithm, would you be likely to adopt it for rsync?

Matt


More information about the rsync mailing list