[patch] read-devices
Eran Tromer
eran at tromer.org
Mon Aug 5 21:07:08 EST 2002
Donovan Baarda wrote:
> On Mon, Aug 05, 2002 at 10:28:38AM +0300, Eran Tromer wrote:
> I'm not that familiar with rsync code at the lowest level, but I'm guessing
> this patch adds support for in-place patching, possibly without reverse
> seeks in the basis.
>
> How does this work? Does it have an adverse affect on the delta size? Have I
> got this wrong and it just allows the source to be a device, still requiring
> a temporary file for target creation?
My patch only allows reading devices/fifos; it's business as usual on
the receiving end. The diskspace conservation consideration that I
mentioned applies only to the sender.
It seems to me that a simple form of in-place update would be fairly
easy to implement (sender: in hash_search() test only blocks whose
beginning is not before the current offset. receiver: don't create a
tempfile, and do an ftruncate() at the end).
This will work very well for big record-based files, hard disk block
devices, or any other source that doesn't shift data around. It will
fail horribly if data is shifted forward (i.e., insertion in
beginning/middle of file), so for stuff that involves many shifts (e.g.,
% tar cf - / | rsync -L --read-devices /dev/stdin remote:backup.tar
), on average about half the matches will be lost.
This can be alleviated using buffering -- keep a copy of the last X
overwritten bytes (either in memory or in a tempfile), so that matches
to that data would remain be possible. This will allow insertions of up
to X bytes to be handled efficiently. On "average" you'll need to keep a
buffer of size X = O(sqrt(file_size)), which is much more economical
than the current method if you don't care about a backup copy.
A fancier method would be for the sender to tell the receiver something
like "save a copy of the next N bytes and call it foo", later say "write
a copy of foo at the current offset" and finally "forget foo". Simpler,
but perhaps nearly as effective, would be to have a variable buffer
size, controlled by the sender. I'm not sure either can be done with a
single-pass sender, though.
Eran
More information about the rsync
mailing list