rsync in-place (was Re: rsync 1tb+ each day)

jw schultz jw at pegasys.ws
Wed Feb 5 13:19:56 EST 2003


On Wed, Feb 05, 2003 at 12:47:49PM +1100, Martin Pool wrote:
> On  4 Feb 2003, jw schultz <jw at pegasys.ws> wrote:
> 
> > The reason why in-place updating is difficult is that
> > rsync expects the unchanged blocks in the old file may be
> > relocated.  Data inserted into or removed from the file does
> > not require the rest of the file to be retransmitted.
> > Unchanged blocks will be copied from the old locations in
> > the old file to new locations in the new file.
> > 
> > In-place updates requires that blocks not relocate.
> > It may be possible by disallowing matches having differing
> > offsets.  That would require deeper investigation.
> 
> Of course the other place where people want this is for transfers of
> block devices, where the rename is just not possible.
> 
> I looked a little at doing this in librsync.  The naive solution is to
> merely prohibit the delta from referring to blocks that have been
> already overwritten.  I will probably eventually add at least this
> option.
> 
> You might try this in rsync.  A lot of other code to do with
> e.g. setting permissions makes the assumption of the rename model,
> though.  It would take a fair amount of testing.

I certainly am not interested in coding it.  Too small a
target use.

> Of course this model really falls down in some cases.  Consider the
> case of one block inserted at the beginning.  Then with the naive "no
> backreferences" approach every block will be overwritten just before
> it's needed. :( 

I was thinking more in terms of no block relocation at all.
Checksums only match if at the same offset.  The receiver simply
discards (or never gets) info about blocks that are
unchanged.  It would just lseek and write with a possible
truncate at the end.

> You can imagine a smarter algorithm that does non-sequential writes to
> the output so as to avoid writing over blocks that will be needed
> later.  Alternatively, if you assume some amount of temporary storage,
> then it might be possible to still produce output as a stream.

I really doubt it is worthwhile doing to rsync.  This
principly applies to block oriented files such as devices
and database files.  For the most part rsync handles these
fine.

If someone really does feel they must have this i'd suggest
creating a different tool just for this job.  It could
operate on just one file at a time and either be smart
enough to suss the optimal block size by knowing the file
type or accept a block size from the command-line.  Block
sizes for this would clearly be of the power-of-two variety.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt


More information about the rsync mailing list