[clug] using rsync for backups

Michael James michael at james.st
Fri May 23 02:21:17 GMT 2008

From my experience backing up 3/4 of a terrabyte using rsync,
 I've had some thoughts for possible improvements.

Using prepared filelists:

A normal rsync job sets up a sender and a reciever process
 which each generate filelists. Then the lists are compared,
 a task list is created, and the 2 processes get to work
 transferring new and changed files.

When the reciever is a backup, no changes happen between rsyncs.
So the filelist is (theoretically at least) the same as it was last time.

Can rsync (or could it be modified to) use the pre-existing filelist?
So the sender process could load the filelist from last time,
 prepare the task list and save the reciever all that preparation.
For my 3/4 of a terrabyte system that's ~3 hours of work.
Occasionally the reciever could re-scan the backup,
 just to make sure the filelist reflects reality.

Coping with dormant branches by tarring, zipping and archiving:

A lot of the space on my disks is taken up by files
 that mustn't be lost, but probably won't be needed again.
As long as they can be easily recovered,
 making people wait a day to get them back wouldn't be a problem.
So they are candidates for shipping to a tape-based data silo.

So how about having a cron job that "find"s unread branches,
 tars and zips them up, puts the .tgz file into a separate archive area,
 and replaces the head directory with a special link.
The archive area is  rsync-ed to the silo without  --delete
 so once the .tgz file has been transferred
 it can be deleted from the source.

I'd prefer to use tar as this would cut our use of inodes in the silo,
 which has also been an issue.

So much for the backing up, any fool can backup, but what about recovery?

The source archive area is mounted using some fuse magic
 so users can  "ls -l"  it and even see the rsync sizes, dates
 and checksums.  Once they ask for any contents,
 the fuse magic sets about recovering the tgz file
 and unpacking it back into its original location.
I don't know what happens to the requesting process
 during the wait.  It hangs?  It returns a warning?  An error?

Any comments?

There is no perl one line hack
 that a page of java won't do more elegantly.

More information about the linux mailing list