[librsync-devel] Re: state of the rsync nation? (revisited 6/2003 from 11/2000)

Donovan Baarda abo at minkirri.apana.org.au
Wed Jun 11 14:29:35 EST 2003


On Wed, 2003-06-11 at 13:59, Martin Pool wrote:
> On 11 Jun 2003, Donovan Baarda <abo at minkirri.apana.org.au> wrote:
> 
> > The vcdiff standard is available as RFC3284, and Josh is listed as one
> > of the authors. 
> 
> Yes, I've just been reading that.
> 
> I seem to remember that it was around as an Internet-Draft when I
> started, but it didn't seem clear that it would become standard so I
> didn't use it.

I'm not sure if this is the same one... I vaguely recall something like
this too, but I think it was an attempt to add delta support to http and
had the significant flaw of not supporting rsync's
"delta-from-signature". It may have come out of the early xdelta http
proxy project. IMHO rproxy's http extensions for delta support were
better because they were more general.

There was also another thing I saw which was a compact delta
representation spec that I think librsync uses (perhaps it was you who
had some discussion about it in the old librsync TODO?), and may have
influenced the vcdiff RFC, but AFAIK was never "official" in any way.

> > I also had some correspondence with Josh ages ago where he talked about
> > how self-referencing delta's can directly do compression of the miss
> > data without using things like zlib and by default gives you the
> > benefits of rsync's "context compression" without the overheads (rsync
> > runs a decompressor _and_ a compressor on the receiving end just to
> > regenerate the compressed "hit" context data).
> 
> Something possibly similar is mentioned in tridge's thesis.  I was
> talking to him a while ago and (iirc) he thought it would be good to
> try it again, since it does well with the large amounts of memory and
> CPU time that are available on modern machines.

I forget if I saw this in Tridge's thesis, but I definitely noticed that
librsync uses a modified zlib to make feeding data to the compressor and
throwing away the compressed output more efficient. I have implemented
this in pysync too, though I don't use a modified zlib... I just throw
the compressed output away.

The self referencing compression idea is neat but would be a...
challenge to implement. For it to be effective, the self-referenced
matches would need to be non-block aligned like xdelta, which tends to
suggest using xdelta to do the self-reference matches on top of rsync
for the block aligned remote matches. Fortunately xdelta and rsync have
heaps on common, so implementing both in one library would be easy (see
pysync for an example).

If I didn't have paid work I would be prototyping it in pysync right
now. If anyone wanted to fund something like this I could make myself
available :-)

> I strongly agree with what you said a while ago about code simplicity
> being more valuable than squeezing out every last bit.

Yeah, my big complaint about librsync at the moment is it is messy. Just
cleaning up the code alone will be a big improvement. I would guess that
at least 30% of the code could be trimmed away, leaving a cleaner and
more extensible core, and because "messy" leads to "inefficient", it
would be faster too.

-- 
Donovan Baarda <abo at minkirri.apana.org.au>
http://minkirri.apana.org.au/~abo/




More information about the rsync mailing list