issues with batch mode for incremental backups

Andrew Pimlott andrew at pimlott.net
Tue Apr 13 18:58:02 MDT 2010


Hi!

I use rsync batch mode for incremental backups.  That is, I create an
on-line backup with rsync, and use the --write-batch flag to
additionally generate my delta, which I send off-site.  To restore, I
download a full backup and apply the deltas with --read-batch.  This is
quite a lovely setup, in principle.

However, restore has give me problems.  Before I detail them, let me
ask:  Are a lot of people using batch mode for incremental backup?  Is
this a recommended use?  I've seen some activity on batch mode in the
changelogs, so I hope it is actively supported.  Can anyone affirm this?

My problems:

To start, I was using the rsync-3.0.3-2 package from Debian stable.
I started the restore process and made it through many batch files, when
finally one failed with

    rsync: connection unexpectedly closed (536 bytes received so far) [generator]
    rsync error: error in rsync protocol data stream (code 12) at io.c(635) [generator=3.0.3]

I looked in my backup logs, and this correlated with a backup that
produced the message

    file has vanished: "/tmp/gitcvs.5Rgp5s"
    rsync warning: some files vanished before they could be transferred (code 24) at main.c(1058) [sender=3.0.3]

I upgraded to the rsync-3.0.7-2 package from Debian unstable that I
compiled myself.  This time, the restore got through some batch files
without incident but not as many as before, then one of them spit out a
huge number of "Skipping batched update" and "No batched update"
messages.  And indeed those files were missing.  I debugged into
gen_wants_ndx, which was returning 0 for those files, but I couldn't
understand what was causing the problem.  It was very strange, though:
the set of files skipped or missing varied based on whether or not I
redirected the output!  So maybe somehow file descriptors were getting
mixed up?

Then I upgraded to rsync-HEAD-20100331-2202GMT.  Now, when the restore
gets to the batch file that caused the failure for 3.0.3, it produces
the message

    (No batched update for "tmp/gitcvs.5Rgp5s")
    rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1619) [generator=3.1.0dev]
    zsh: exit 23    sudo /home/andrew/src/rsync-HEAD-20100331-2202GMT/rsync  -a --hard-links

That's not terrible.  Ideally, the "vanished" file would have been
omitted from the batch file entirely so I would see no message.  Failing
that, I would rest easier seeing the original exit status 24 ("Partial
transfer due to vanished source files") than 23 ("Partial transfer due
to error"), which sound scary.

I am relieved to find that my backups seem to be usable.  But I had to
go all the way to the bleeding edge of rsync to achieve this.  Do the
nigthly builds tend to be reliable?

The next thing I wanted to address is that restoring batch files was
slow.  The main reason is that I used the --link-dest option when
creating the backup.  (--link-dest is pretty integral to my on-line
backup strategy because I give it multiple times, linking snapshots of
multiple hosts to each other as well as to their last snapshot.  So
"don't use --link-dest" isn't a good solution.)  Therefore, I had to do
the same in --read-batch mode.  As a result, the restore time was
dominated by recreating the entire tree with links.  But this shouldn't
be necessary:  In the restore scenario, I am not concerned with
preserving multiple snapshots, so I should be able to operate on one
tree.  Any time the batch file says to link from somewhere else, just
use the existing file.

When I tried simply leaving off the --link-dest option, rsync complained
about "invalid basis_dir index".  So I found this code, and instead
copied the common case where the file to work on is the destination
file:

--- receiver.c.orig     2010-04-14 00:30:45.000000000 +0000
+++ receiver.c  2010-04-14 00:31:16.000000000 +0000
@@ -654,10 +654,8 @@
                                break;
                        default:
                                if (fnamecmp_type >= basis_dir_cnt) {
-                                       rprintf(FERROR,
-                                               "invalid basis_dir index: %d.\n",
-                                               fnamecmp_type);
-                                       exit_cleanup(RERR_PROTOCOL);
+                                       fnamecmp = fname;
+                                       break;
                                }
                                pathjoin(fnamecmpbuf, sizeof fnamecmpbuf,
                                         basis_dir[fnamecmp_type], fname);

My restore went 5 times faster, and I verified (with rsync!) that I
ended up with the same tree.  I didn't really know what I was doing, but
am I on the right track?  Can anyone think of where this could go wrong?
If this were controlled by an option, say --ignore-batch-link-dest,
would it be acceptible for mainline rsync?

Thank you for your thoughts.

Andrew


More information about the rsync mailing list