[patch] read-devices

Fri Aug 30 02:49:01 EST 2002

Hi,

Thanks for your explanation. Before answering, I'll note that (as 
mentioned in a followup) my patch has some unintended debugging cruft 
and I'll gladly provide a clean patch if anyone's interested.

Well, here's my personal motivation. I have a remotely located server 
with a 6GB disk, and also a newer machine a 80GB disk. I wish to keep a 
full backup of the server filesystem on the other machine. A fairly 
common setting, I imagine. Originally I rsynced the servers root 
directory into a directory on the backup machine. This has two main 
disadvantages:

1. To get permissions and special files right, I had to run the receiver 
side as root. From a security viewpoint, this is already nasty in the 
extreme, but it's much worse than that, because now the backup machine 
contained all sort of spooky device files and suid root binaries and 
whatnot, created by rsync. If the server was compromised, in all 
likelyhood the backup machine would go down next. Trusting permissions 
on the parent directory of the rsync destination directory is not quite 
enough for such stuff -- these things *have* been bypassed before.

2. The target machine has a different filesystem. In my case it was only 
a larger block size, which caused the backup to consume *much* more 
diskspace than needed. In other cases, it could mean losing ACLs or 
resource forks or whatever.

Now the beautiful thing is that tar handles all those things. All I 
really want to do is to have an up-to-date "server.tar" file on the 
backup machine, containing the output of running "tar cf - /" on the 
server. But to do this with unpatched rsync, I needed to first create 
the tar file on the server, which would *double* its diskspace requirements.

Nowadays, I pipe the output of "tar cf - /" into a pipe that's given as 
as a filename to a patched sender rsync; the patched receiver rsync then 
updates the corresponding huge file on the client.  I have a rotten ADSL 
connection with terrible upstream speed so I increased the block size to 
10K, but the daily updates of these 6GB still take about 10 minutes (the 
bottleneck is the server disk I/O time). Pretty fast, for something 
completely stateless that's perfectly immune to wrong timestamps and 
suchlike.

The SQL thing I mentioned is related, but here there isn't even a "just 
rsync the files directly" alternative. Again, say you have a server 
running some database server, and you want to take snapshots for backup 
purposes, or whatever. Rsyncing the actual database files, if at all 
possible for your RDBMS, will give you a corrupted database. However, if 
you have a utility to dump the database into a flat file (e.g., 
mysqldump of MySQL) then you can use that to get a consistent snapshot, 
and rsync that snapshot. As before, to prevent the need for a huge 
temporary file it's really nice to be able to pipe the output of the 
dump utility directly into rsync.

Someone else had a different scenario, involving backing up a complete 
disk partition at the block level. This also sounds quite useful, and 
again quite impossible with unpatched rsync without a lot of extra 
diskspace outside that partition.

As you can see, all of these backup-related scenarios make sense even 
without the ability to update special files (which is obviously 
problematic, though as I suggested a while ago, perhaps many common 
cases *can* be efficiently handled pretty easily, especially data is 
frequently updated but seldom shifted --- think /dev/hda).

Last but not least: this is the Unix way. It provides much power and 
flexibility to the user, hence it is a Good Thing. I find my local 
patched rsync very useful, and would experience eternal merriment if 
everybody got that feature.

"device files" come into the picture simply because a pipe into stdin 
(or any other FD) is seen a special file (on Linux, at least), and the 
"--devices" command to rsync affects all special files, so 
"--read-devices" appears appropriate. "--read-special" would be more 
accurate, but less consistent.

BTW, I noticed that statistics reporting isn't working very well when 
special files are read using my --read-devices patch, despite my efforts 
to get some reasonable behavior (the problem is that the file size isn't 
known until the whole file was read and sent). Probably fixable, and not 
a major issue when someone needs the feature, methinks.

   Regards,
     Eran

Dave Dykstra wrote:
> Similar patches have been submitted before but they've always been
> rejected.  I'm sorry that you spent so much time on this one, although
> perhaps it's been useful to you so far.
> 
> As you know, rsync won't be able to write to such devices because it needs
> to work on a temporary copy; that limits the usefulness of reading from
> them greatly.  I'm not at all convinced that it's worths supporting it in
> rsync.
> 
> I can't read your examples in the rsync.fom right now because it's broken,
> so I don't understand your motivating examples.  How can you rsync into a
> tar file, and what do SQL databases have to do with device files?
> 
> - Dave Dykstra