[patch] read-devices
Eran Tromer
eran at tromer.org
Fri Aug 30 02:49:01 EST 2002
Hi,
Thanks for your explanation. Before answering, I'll note that (as
mentioned in a followup) my patch has some unintended debugging cruft
and I'll gladly provide a clean patch if anyone's interested.
Well, here's my personal motivation. I have a remotely located server
with a 6GB disk, and also a newer machine a 80GB disk. I wish to keep a
full backup of the server filesystem on the other machine. A fairly
common setting, I imagine. Originally I rsynced the servers root
directory into a directory on the backup machine. This has two main
disadvantages:
1. To get permissions and special files right, I had to run the receiver
side as root. From a security viewpoint, this is already nasty in the
extreme, but it's much worse than that, because now the backup machine
contained all sort of spooky device files and suid root binaries and
whatnot, created by rsync. If the server was compromised, in all
likelyhood the backup machine would go down next. Trusting permissions
on the parent directory of the rsync destination directory is not quite
enough for such stuff -- these things *have* been bypassed before.
2. The target machine has a different filesystem. In my case it was only
a larger block size, which caused the backup to consume *much* more
diskspace than needed. In other cases, it could mean losing ACLs or
resource forks or whatever.
Now the beautiful thing is that tar handles all those things. All I
really want to do is to have an up-to-date "server.tar" file on the
backup machine, containing the output of running "tar cf - /" on the
server. But to do this with unpatched rsync, I needed to first create
the tar file on the server, which would *double* its diskspace requirements.
Nowadays, I pipe the output of "tar cf - /" into a pipe that's given as
as a filename to a patched sender rsync; the patched receiver rsync then
updates the corresponding huge file on the client. I have a rotten ADSL
connection with terrible upstream speed so I increased the block size to
10K, but the daily updates of these 6GB still take about 10 minutes (the
bottleneck is the server disk I/O time). Pretty fast, for something
completely stateless that's perfectly immune to wrong timestamps and
suchlike.
The SQL thing I mentioned is related, but here there isn't even a "just
rsync the files directly" alternative. Again, say you have a server
running some database server, and you want to take snapshots for backup
purposes, or whatever. Rsyncing the actual database files, if at all
possible for your RDBMS, will give you a corrupted database. However, if
you have a utility to dump the database into a flat file (e.g.,
mysqldump of MySQL) then you can use that to get a consistent snapshot,
and rsync that snapshot. As before, to prevent the need for a huge
temporary file it's really nice to be able to pipe the output of the
dump utility directly into rsync.
Someone else had a different scenario, involving backing up a complete
disk partition at the block level. This also sounds quite useful, and
again quite impossible with unpatched rsync without a lot of extra
diskspace outside that partition.
As you can see, all of these backup-related scenarios make sense even
without the ability to update special files (which is obviously
problematic, though as I suggested a while ago, perhaps many common
cases *can* be efficiently handled pretty easily, especially data is
frequently updated but seldom shifted --- think /dev/hda).
Last but not least: this is the Unix way. It provides much power and
flexibility to the user, hence it is a Good Thing. I find my local
patched rsync very useful, and would experience eternal merriment if
everybody got that feature.
"device files" come into the picture simply because a pipe into stdin
(or any other FD) is seen a special file (on Linux, at least), and the
"--devices" command to rsync affects all special files, so
"--read-devices" appears appropriate. "--read-special" would be more
accurate, but less consistent.
BTW, I noticed that statistics reporting isn't working very well when
special files are read using my --read-devices patch, despite my efforts
to get some reasonable behavior (the problem is that the file size isn't
known until the whole file was read and sent). Probably fixable, and not
a major issue when someone needs the feature, methinks.
Regards,
Eran
Dave Dykstra wrote:
> Similar patches have been submitted before but they've always been
> rejected. I'm sorry that you spent so much time on this one, although
> perhaps it's been useful to you so far.
>
> As you know, rsync won't be able to write to such devices because it needs
> to work on a temporary copy; that limits the usefulness of reading from
> them greatly. I'm not at all convinced that it's worths supporting it in
> rsync.
>
> I can't read your examples in the rsync.fom right now because it's broken,
> so I don't understand your motivating examples. How can you rsync into a
> tar file, and what do SQL databases have to do with device files?
>
> - Dave Dykstra
More information about the rsync
mailing list