benchmarking rsync's -z compression utility

Leaw, Chern Jian chern.jian.leaw at intel.com
Sun May 11 19:31:24 EST 2003


Donavan,
Yes, I'm referring to librsync's -z compression vs running and external
compression tool on the files to be transferred. 
 
When you mentioned the following:
"rsync _should_ be able to do better with -z because it uses
"context-compression" by "priming" the compressor with hits and
discarding the compressed output. This means the compressor and
de-compressor see the whole file, even though only the compressed miss
data is transmitted."

Could you kindly elaborate further on context-compression? What do you mean
when you mentioned by "primming the compressor with hits and discarding
compressed output." ? I'd like to gain a further understanding of the this
concept.

Thanks
 


-----Original Message-----
From: Donovan Baarda [mailto:abo at minkirri.apana.org.au]
Sent: Sunday, May 11, 2003 1:45 PM
To: Leaw, Chern Jian
Cc: rsync-request at lists.samba.org; rsync at lists.samba.org
Subject: Re: benchmarking rsync's -z compression utility


On Sat, 2003-05-10 at 19:58, Leaw, Chern Jian wrote:
> Hi,
> Is there a way in which rsync's -z compression (zlib) utility can be
> benchmarked? 
> 
> I'm trying to compare the compression ratio between rsync and external
> compression tools like gzip and bzip2. 
> 
> Are there any advantages to using rsync's internal compression mechanism
> specified with the -z option compared to solely applying external
> compression i.e. bzip2 to the files and invoking rsync to transfer these
> files without the -z option?

I'm assuming here you are talking about using librsyncs -z vs running
librsync without it through a compressed pipe, and are aware that rsync
does delta-compression to updated a basis file in both cases.

rsync _should_ be able to do better with -z because it uses
"context-compression" by "priming" the compressor with hits and
discarding the compressed output. This means the compressor and
de-compressor see the whole file, even though only the compressed miss
data is transmitted.

my experiments with pysync confirmed that this does make a measurable
difference (see the comments with pysync) on real world compressible
data.

A similar benefit could be achieved with self-referencing deltas, as
supported by the vcdiff format (soon to be) used by xdelta.

-- 
----------------------------------------------------------------
Donovan Baarda                http://minkirri.apana.org.au/~abo/
----------------------------------------------------------------



More information about the rsync mailing list