benchmarking rsync's -z compression utility

Donovan Baarda abo at minkirri.apana.org.au
Mon May 12 12:14:08 EST 2003


On Sun, 2003-05-11 at 19:31, Leaw, Chern Jian wrote:
> Donavan,
> Yes, I'm referring to librsync's -z compression vs running and external
> compression tool on the files to be transferred. 
>  
> When you mentioned the following:
> "rsync _should_ be able to do better with -z because it uses
> "context-compression" by "priming" the compressor with hits and
> discarding the compressed output. This means the compressor and
> de-compressor see the whole file, even though only the compressed miss
> data is transmitted."
> 
> Could you kindly elaborate further on context-compression? What do you mean
> when you mentioned by "primming the compressor with hits and discarding
> compressed output." ? I'd like to gain a further understanding of the this
> concept.

Probably the best way to get an understanding of how rsync works is
IMNSHO to look at pysync. But in any case, here is a brief description
of what I call "context compression".

The sending end compares the "new file" against a signature of the
"basis file" sent from the client. As it goes through the "new file", it
sends the receiving end a sequence of "hit references" and compressed
"miss data". 

With "context compression", you don't just compress the data sent, you
also compress the data _not_ sent. This means the sending end also feeds
all the "hit data" that is not sent into the compressor. The compressed
output corresponding to the "hit data" is discarded. The client
re-constructs the unsent compressed "hit data" by running a compressor
that gets fed all the decompressed sent "miss data" and the locally
sourced "hit data".

Hope that helps... have a look at pysync for an easy to read example
implementation.

-- 
----------------------------------------------------------------
Donovan Baarda                http://minkirri.apana.org.au/~abo/
----------------------------------------------------------------



More information about the rsync mailing list