sn at ParlaNet.de
Mon Aug 5 06:59:02 EST 2002
On Mon, Aug 05, 2002 at 10:28:38AM +0300, Eran Tromer wrote:
> I'd like to propose a new option to rsync, which causes it to read
> device files as if they were regular files. This includes pipes,
> character devices and block devices (I'm not sure about sockets). The
> main motivation is cases where you need to synchronize a large amount of
> data that is not available as regular files, as in the following scenarios:
> * Keep a copy of a block device
> * Make backups using a huge tar file instead of actual files, to prevent
> need to be root in order to preserve attributes:
> * Keep an SQL-formatted dump of database tables (i.e., output of
> mysqldump) for backup purposes
> In all three cases, using a temporary file may require unreasonable
> amounts of additional diskspace, and there is no good reason to mandate it.
> Attached is a largish patch against rsync-2.5.5 that adds this feature.
> The primary issues are as follows:
> 1. Addition of a "--read-devices" option for activating the device
> reading behavior, and acting accordingly in a few locations.
> 2. The code and usage of map_ptr() required the file size to be known in
> advance. This is impossible for devices and FIFOs.
And for decompressed data --- I have started to modify rsync and to rewrite
the sender to do only sequentiell reads.
But at the moment I've got stuck and the weather was fine..... :-)
> To address this, I
> changed the interface of map_ptr. It now returns void rather than char*,
> and updates two new members of map_struct:
> - m_ptr contains the pointer to the new data
> - m_len contains the length of the new data
> (at most the length requested, but less if EOF encountered)
> Also, the map_struct->file_size field is now updated by map_ptr() when
> it encounters an EOF.
> I updated all invocations of map_ptr() to use the new interface, and act
> correctly when a premature EOF is encountered. Apart from supporting the
> new feature, this makes the code more robust to EOFs (no more zeroing
> buffer memory and hoping for the best!) and, arguably, more elegant. The
> change affected many locations, but is pretty straightforward.
> I tried to do the minimal changes rather than eliminate all traces of
> the predetermined length assumption, since that would greatly inflate
> the patch. Also, this shouldn't prevent map_ptr() from being a wrapper
> to mmap() in the future, though the map_ptr() code will need to contain
> a fallback to read().
My intension was to run the rsync checksum algorithm on an decoded (gzip)
data stream. For this I had to to get rid of all possible calls of seek()
I will have a look at your code before continuing with my attempt.
> 4. FIFOs and character devices do not support the seek operation. It so
> happens that the combination of code in map_ptr() and the way it's used
> guarantees that the transmitter always reads the underlying file
> sequentially without seeks, for reasonable block sizes. This seems to be
> the case both from looking at the code and from testing it. Alas, this
> property is fragile and may break with changes in constants or code. I
> don't see how this can be fixed without pulling out all the map_ptr() code.
I think you are right and for this reason I started to rewrite the sender.
If I hadn't written the generator patch the modifications to generator
to work with files of unknown size had been trivial. I think solving
both problems will lead to a protocol change.
Stefan Nehlsen | ParlaNet Administration | sn at parlanet.de | +49 431 988-1260
More information about the rsync