Compressed backup

Donovan Baarda abo at minkirri.apana.org.au
Thu May 30 18:50:02 EST 2002


On Thu, May 30, 2002 at 03:35:05PM -0700, jw schultz wrote:
[...]
> > There is a patch available to gzip to add an option --rsyncable that's
> > supposed to make it work better with rsync.  It's been put into the
> > "patches" directory for the next release of rsync, or you can get it at
> > 
> >     http://rsync.samba.org/ftp/unpacked/rsync/patches/gzip-rsyncable.diff
> 
> I took a quick look at this patch and i think it does what i expected.
> It resets the compression algorithm after each 4KB of
> compresstext.  This means that if you change 1 byte early in
> the file it might or might not affect the blocks later on.
> The reason for the equivication is that if the change alters
> the compression ratio the savings are gone.

If that is how it works, and I think you are right, then it would only work
for the smallest of cases, rendering the gzip-rsyncable patch worse than
useless for the vast majority of cases.

Regular resets hurt the compression ratio. Resets must occur at the same
begin/end boundary points of an unchanged sequence of uncompresstext for the
resultant compresstext to be unchanged. The only changes that will result in
resets occuring at the same boundary points for any unchanged text following
the change _must_ result in compresstext that is an exact multiple of 4KB.
This means any insertion/deletion/replacement must not change the size of
the resulting compresstext unless it is by an exact multiple of 4KB.

I would guess that the number of changes meeting this criteria would be
almost non-existant. I suspect that the gzip-rsyncable patch does nearly
nothing except produce worse compression. It _might_ slightly increase the
rsyncability up to the point where the first change in the uncompresstext
occurs, but the chance of it re-syncing after that point would be extremely
low.

I tried to think of a way of doing this so that it would eventualy re-sync,
with things like resets every <some-prime> bytes so that the reset window
moves, but the problem is the source and target reset windows must move
together for it to work, so any scheme that moves the reset window into sync
will also move the window _out_ of sync. 

I don't think it is possible to come up with a scheme where the reset
windows could re-sync after a change and then stay sync'ed until the next
change, unless you dynamiclly alter the compression at sync time... you may
as well rsync the decompressed files.

-- 
----------------------------------------------------------------------
ABO: finger abo at minkirri.apana.org.au for more info, including pgp key
----------------------------------------------------------------------




More information about the rsync mailing list