[patch] read-devices

Eran Tromer eran at tromer.org
Mon Aug 5 21:07:08 EST 2002


Donovan Baarda wrote:
> On Mon, Aug 05, 2002 at 10:28:38AM +0300, Eran Tromer wrote:

> I'm not that familiar with rsync code at the lowest level, but I'm guessing
> this patch adds support for in-place patching, possibly without reverse
> seeks in the basis.
> 
> How does this work? Does it have an adverse affect on the delta size? Have I
> got this wrong and it just allows the source to be a device, still requiring
> a temporary file for target creation?

My patch only allows reading devices/fifos; it's business as usual on 
the receiving end. The diskspace conservation consideration that I 
mentioned applies only to the sender.

It seems to me that a simple form of in-place update would be fairly 
easy to implement (sender: in hash_search() test only blocks whose 
beginning is not before the current offset. receiver: don't create a 
tempfile, and do an ftruncate() at the end).

This will work very well for big record-based files, hard disk block 
devices, or any other source that doesn't shift data around. It will 
fail horribly if data is shifted forward (i.e., insertion in 
beginning/middle of file), so for stuff that involves many shifts (e.g.,
% tar cf - / | rsync -L --read-devices /dev/stdin remote:backup.tar
), on average about half the matches will be lost.

This can be alleviated using buffering -- keep a copy of the last X 
overwritten bytes (either in memory or in a tempfile), so that matches 
to that data would remain be possible. This will allow insertions of up 
to X bytes to be handled efficiently. On "average" you'll need to keep a 
buffer of size X = O(sqrt(file_size)), which is much more economical 
than the current method if you don't care about a backup copy.

A fancier method would be for the sender to tell the receiver something 
like "save a copy of the next N bytes and call it foo", later say "write 
a copy of foo at the current offset" and finally "forget foo". Simpler, 
but perhaps nearly as effective, would be to have a variable buffer 
size, controlled by the sender. I'm not sure either can be done with a 
single-pass sender, though.

   Eran





More information about the rsync mailing list