How to emulate rdiff behaviour

Matt McCutchen matt at mattmccutchen.net
Mon Jun 1 03:11:19 GMT 2009


On Mon, 2009-06-01 at 07:10 +0800, Daniel.Li wrote:
> On Sun, 2009-05-31 at 14:34 -0400, Matt McCutchen wrote:
> > > And one more thing here:
> > > If you are going to prepare this batch file, it seems there will be
> > double the workload of network, see below statements? Is that right?
> > > 
> > > > rsync -av --only-write-batch=/batches/$DATE
> > bhost:/backup2/ /backup/
> > > > rsync -av /backup/ bhost:/backup2/
> > 
> > Not if you put /backup and /backup2 on the same machine by dropping
> > the
> > "bhost:" from those commands, as Wayne mentioned.  He included it just
> > to point out the possibility of having those dirs on different
> > machines.
> > 
> 
> Humm... I have a NAS storage device, which is bhost,

Does it have to be that way?  If you run Wayne's entire command sequence
on the NAS, you won't have a "bhost" and you'll avoid doubling the
network usage.

> and I found that
> rsync will NOT simple transfer exactly the diff size of the contents,
> see below my explanation.
> 
> I have done a simple test:
> 1) backup a file;
> 2) remove 256 bytes in this file;
> 3) backup again, this algorithm will transfer 1144 bytes of diff data,
> which is about 4 times the size of original removed data block.

> Well, I think I don't have much detailed knowledge on rsync diff
> algorithm. So I raised this question here. Maybe the transfer size will
> be different depending on the offset and size of removed data block.
> 
> Can anyone here help to explain it? Or is there any info that I have
> missed or any report/document/evaluation? Please help to point me to
> these URLs. Thanks.

Rsync breaks the existing destination file into blocks, and if an entire
block is matched in the source file, the receiver will reuse that block.
Any change to a block necessitates resending the entire block.  For
details, see the technical report at
http://rsync.samba.org/tech_report/ , which is linked from the
"resources" page of the rsync Web site.

You can set a block size to be used for all files with the --block-size
option.  Rsync defaults to the square root of the size of each file,
which balances the network traffic sent in both directions (checksums
and changed blocks) under the assumption that there are relatively few
changes to each transferred file.  You may find that a different block
size works better depending on the pattern of changes and the asymmetry
(if any) of the network capacity.

-- 
Matt



More information about the rsync mailing list