Reliability and robustness problems

John Van Essen vanes002 at umn.edu
Tue Jun 8 04:46:17 GMT 2004


(I see there's already been an exchange between you and Wayne, but
I'll still send this reply that I composed to your original email.)

On Tue, 08 Jun 2004, John <rsync at computerdatasafe.com.au> wrote:
>
> We maintain the contents of latest thus:
> + rsync --recursive --links --hard-links --perms --owner --group
> --devices --times --sparse --one-file-system --rsh=/usr/bin/ssh --delete
> --delete-excluded --delete-after --max-delete=80 --relative --stats
> --numeric-ids --exclude-from=/etc/local/backup/system-backup.excludes
> /boot/ / /home/ /var/ /var/local/backups/office//latest

Why the double slash before latest?

> and create the backup-du-jour:
> + cp -rl /var/local/backups/office//latest
> /var/local/backups/office//20040607-2128-mon
> 
> That part works well, and the rsync part generally takes about seven
> minutes.
> 
> To copy office to home we try this:
> + rsync --recursive --links --hard-links --perms --owner --group
> --devices --times --sparse --one-file-system --rsh=/usr/bin/ssh --delete
> --delete-excluded --delete-after --max-delete=80 --relative --stats
> --numeric-ids /var/local/backups 192.168.0.1:/var/local/backups/

I can see where you will have a dreadful number of files to process
if you are also processing all the previous backups.

> Problems we've had include
> 1. ADSL connexion at one end ot the other dropping for a while. rsync
> doesn't notice and mostly  hangs. I have seen rsync at home still
> running but with no relevant files open.
> 
> 2. rsync uses an enormous amount of  virtual memory with the result the
> Linux kernel lashes out at lots of processes, mostly innocent, until it
> lucks on rsync. This can cause rsync to terminate without a useful message.
> 2a. Sometimes the rsync that does this is at home.
> I've alleviated this at office by allocating an unreasonable amount of
> swap: unreasonable because if it gets used, performance will be truly
> dreadful.

In neither this nor your previous post have you mentioned the
verison of rsync or the OSes involved.

rsync prior to 2.6.2 (skipping 2.6.1) have non-optimized hard link
processing that used twice as much memory (!) and sometimes copied
hard-linked files when there was already a match on the receiver.

If you are not using 2.6.2, install that on both ends and try it
again.

> 3. rsync does not detect when its partner has vanished. I don't
> understand why this should be so: it seems to me that, at office, it
> should be able to detect by the fact {r,s}sh has terminated or by
> timeout, and at home by timeout.

There are two timeouts - a relatively short internal socket I/O
timeout and a user-controlled client-server communications timeout.
If you are not using --timeout and the link goes down at the wrong
time, rsync can sit there forever waiting for the next item from the
other end.

Use --timeout set to some number of seconds that seems long enough
to get the job done.  If it times out, then either bump it or try
to solve the cause of the timeout.

> 3a. It'd like to see rsync have the ability to retry in the case it's
> initiated the transfer. It can take some time to collect together the
> information as to what needs to be done: if I try in its wrapper script,
> then this has to be redone whereas, I surmise, rsync doing the retry
> would not need to.

You need to avoid the kinds of rsync where this becomes a major factor.

> 4. I've already mentioned this, but as I've had no feedback I'll try again.
> As you can see from the above, the source directories for the transfer
> from office to home are chock-full of hard links. As best I can tell,
> rsync is transferring each copy fresh instead of recognising the hard
> link before the transfer and getting the destination rsync to make a new
> hard link. It is so that it _can_ do this that I present the backup
> directory as a whole and not the individual day's backup. That, and I
> have hopes that today's unfinished work will be done tomorrow.

2.6.2 has fixes for unnecessary transfers.

> btw the latest directory contains 1.5 Gbytes of data. The system is
> still calculating that today's backup contains 1.5 Gbytes, so it seems
> the startup costs are considerable.

It's not the size of the data that hurts, it's the number of files
and directories involved.

Here's what I suggest.

Since you have wisely made a static snapshot of the content that
you wish to back up, do the office -> home rsync in two steps.

First, only rsync the "latest" directory, using your original rsync
arguments with the source and destination as:

  /var/local/backups/latest 192.168.0.1:/var/local/backups/latest/

Unchanged content won't be disturbed.  Changed or new content will
get transferred.

When that completes successfully, then do the second rsync, but
do *not* use --delete-excluded.  The second rsync should include
latest and the new YYYYMMDD-HHMM-ddd directory, and exclude all
others.  That should be nothing but hardlinks and should go very
quickly once the filesystem scan for the two hierarchies is done.
-- 
        John Van Essen  Univ of MN Alumnus  <vanes002 at umn.edu>



More information about the rsync mailing list