rsync in-place (was Re: rsync 1tb+ each day)

Wed Feb 5 12:47:49 EST 2003

On  4 Feb 2003, jw schultz <jw at pegasys.ws> wrote:

> The reason why in-place updating is difficult is that
> rsync expects the unchanged blocks in the old file may be
> relocated.  Data inserted into or removed from the file does
> not require the rest of the file to be retransmitted.
> Unchanged blocks will be copied from the old locations in
> the old file to new locations in the new file.
> 
> In-place updates requires that blocks not relocate.
> It may be possible by disallowing matches having differing
> offsets.  That would require deeper investigation.

Of course the other place where people want this is for transfers of
block devices, where the rename is just not possible.

I looked a little at doing this in librsync.  The naive solution is to
merely prohibit the delta from referring to blocks that have been
already overwritten.  I will probably eventually add at least this
option.

You might try this in rsync.  A lot of other code to do with
e.g. setting permissions makes the assumption of the rename model,
though.  It would take a fair amount of testing.

Of course this model really falls down in some cases.  Consider the
case of one block inserted at the beginning.  Then with the naive "no
backreferences" approach every block will be overwritten just before
it's needed. :( 

You can imagine a smarter algorithm that does non-sequential writes to
the output so as to avoid writing over blocks that will be needed
later.  Alternatively, if you assume some amount of temporary storage,
then it might be possible to still produce output as a stream.

Really for your problem the practical solution is just to dump the
whole file, perhaps allowing for sparse blocks.  As other people have
observed, by design rsync does a lot more disk IO than network.

-- 
Martin