Pushing hard-linked backups

Matt McCutchen matt at mattmccutchen.net
Sun Dec 30 02:35:45 GMT 2007


On Tue, 2007-12-25 at 11:18 -0500, Eric S. Johansson wrote:
> so matt, lets go for the rsnapshot push to a benign host for now.

OK.  I recommend that you use an rsync daemon on the destination host
because that approach keeps all of the snapshot-management logic in one
place and allows you to reconfigure the daemon without touching the
client.  The daemon should be accessed over ssh for security; a
single-use daemon invoked over ssh is the most convenient way to do
this.

You'll need to make a directory on the destination host to hold the
whole setup; I'll call it /backup .  The client (the Windows laptop)
will push its data to a write-only daemon module (which I'll call
"push") that is mapped to a directory under /backup ,
say /backup/incoming .  The "post-xfer exec" command for "push" will
check whether the run was successful and, if so, invoke rsnapshot to
store the contents of /backup/incoming in a snapshot.  I recommend
something like /backup/snapshots for the snapshot root.  You can
retrieve backups by plain rsync over ssh, or if you want the setup
nicely encapsulated, you can configure a separate read-only module
mapped to /backup/snapshots for this purpose.

You will need configuration files for the rsync daemon and rsnapshot
in /backup .  See the rsyncd.conf(5) man page for information about
writing the daemon configuration file.  Since the daemon will be invoked
as needed over ssh, you should not start it manually on the destination
host, and the port and authentication settings are irrelevant.  Consider
setting "max connections" to 1.  The most important bit is the
"post-xfer exec" command, which you should point to a script that I'll
call /backup/kick-rsnapshot .  The script should look like this, where
"interval" is the name of your lowest rsnapshot interval:

#!/bin/bash
if [ "$RSYNC_EXIT_STATUS" == "0" ]; then
        rsnapshot -c /backup/rsnapshot.conf interval
fi

See the rsnapshot(5) man page for information about writing
rsnapshot.conf.  It should list /backup/incoming as the one and only
backup point and /backup/snapshots as the snapshot root.  Be sure to
enable "link dest"; that was the whole point!

Getting rsync on the laptop to access the daemon takes some fancy
syntax.  You have to tell it explicitly to use ssh and run the remote
rsync process in /backup so that it will look for rsyncd.conf there:

rsync -e ssh --rsync-path='cd /backup && rsync' \
	<options> src/ host::push

That's the basic idea.  If you run into trouble with this setup, contact
me on- or off-list for additional help.

This simple setup has the disadvantage of wasting disk space by storing
an extra complete copy of the source in /backup/incoming.  Here are two
approaches to reduce the space overhead:

1. In the rsnapshot configuration, add --link-dest=/backup/incoming so
that /backup/snapshots/interval.0 ends up being completely hard-linked
with /backup/incoming .  Then the overhead is the same as that of an
extra snapshot in which no files changed.  However, now that the module
shares files with snapshots, the snapshots could become corrupted if the
shared files' attributes are tweaked via the module.  To avoid this, use
the --no-tweak-hlinked option implemented by my patch available at:

https://bugzilla.samba.org/show_bug.cgi?id=4561#c1

2. Use rsnapshot's sync_first mode, but in place of running "rsnapshot
sync", move /backup/incoming to /backup/snapshots/.sync .  This is fast
and completely eliminates the space overhead, but then the client has to
specify a --link-dest option so that files in /backup/incoming can be
hard-linked from /backup/snapshots/interval.0 .  The daemon won't allow
a basis dir path that looks like it goes outside the module, so you'll
have to use a symlink to /backup/snapshots/interval.0 inside the module.
This is a bit ugly.  Also, if a push fails and has to be retried, you
are at risk of corrupting snapshots as in #1.  To avoid this, use
--no-tweak-hlinked or, if you care less about the timeliness of the
snapshot, --ignore-existing as recommended by the rsync man page.

Matt



More information about the rsync mailing list