[patch] read-devices

Stefan Nehlsen sn at ParlaNet.de
Mon Aug 5 06:59:02 EST 2002


On Mon, Aug 05, 2002 at 10:28:38AM +0300, Eran Tromer wrote:
> Greetings,
> 
> I'd like to propose a new option to rsync, which causes it to read 
> device files as if they were regular files. This includes pipes, 
> character devices and block devices (I'm not sure about sockets). The 
> main motivation is cases where you need to synchronize a large amount of 
> data that is not available as regular files, as in the following scenarios:
> 
> * Keep a copy of a block device
> http://rsync.samba.org/cgi-bin/rsync.fom?auth=ckdf646d354e73&editCmds=compact&file=87
> 
> * Make backups using a huge tar file instead of actual files, to prevent 
> need to be root in order to preserve attributes:
> http://rsync.samba.org/cgi-bin/rsync.fom?auth=ckdf646d354e73&editCmds=compact&file=228
> 
> * Keep an SQL-formatted dump of database tables (i.e., output of 
> mysqldump) for backup purposes
> 
> In all three cases, using a temporary file may require unreasonable 
> amounts of additional diskspace, and there is no good reason to mandate it.
> 
> Attached is a largish patch against rsync-2.5.5 that adds this feature.
> The primary issues are as follows:
> 
> 1. Addition of a "--read-devices" option for activating the device 
> reading behavior, and acting accordingly in a few locations.
> 
> 2. The code and usage of map_ptr() required the file size to be known in 
> advance. This is impossible for devices and FIFOs.

And for decompressed data --- I have started to modify rsync and to rewrite
the sender to do only sequentiell reads.

But at the moment I've got stuck and the weather was fine..... :-)

> To address this, I 
> changed the interface of map_ptr. It now returns void rather than char*, 
> and updates two new members of map_struct:
> - m_ptr contains the pointer to the new data
> - m_len contains the length of the new data
>    (at most the length requested, but less if EOF encountered)
> Also, the map_struct->file_size field is now updated by map_ptr() when 
> it encounters an EOF.
> I updated all invocations of map_ptr() to use the new interface, and act 
> correctly when a premature EOF is encountered. Apart from supporting the 
> new feature, this makes the code more robust to EOFs (no more zeroing 
> buffer memory and hoping for the best!) and, arguably, more elegant. The 
> change affected many locations, but is pretty straightforward.
> I tried to do the minimal changes rather than eliminate all traces of 
> the predetermined length assumption, since that would greatly inflate 
> the patch. Also, this shouldn't prevent map_ptr() from being a wrapper 
> to mmap() in the future, though the map_ptr() code will need to contain 
> a fallback to read().

My intension was to run the rsync checksum algorithm on an decoded (gzip)
data stream. For this I had to to get rid of all possible calls of seek()
in map_ptr.

I will have a look at your code before continuing with my attempt.

> 4. FIFOs and character devices do not support the seek operation. It so 
> happens that the combination of code in map_ptr() and the way it's used 
> guarantees that the transmitter always reads the underlying file 
> sequentially without seeks, for reasonable block sizes. This seems to be 
> the case both from looking at the code and from testing it. Alas, this 
> property is fragile and may break with changes in constants or code. I 
> don't see how this can be fixed without pulling out all the map_ptr() code.

I think you are right and for this reason I started to rewrite the sender.

If I hadn't written the generator patch the modifications to generator
to work with files of unknown size had been trivial. I think solving
both problems will lead to a protocol change.


cu, Stefan
-- 
Stefan Nehlsen | ParlaNet Administration | sn at parlanet.de | +49 431 988-1260




More information about the rsync mailing list