Problems with rsync 2.5.1pre1 and hardlinks

Mon Dec 10 01:51:29 EST 2001

Hi,

I got stuck within some weird prob concerning my 2-node linux cluster and
the synchronisation tool at hand (rsync-2.5.1pre1).

I have to copy a structure of 70 directories where the data of these 
directories are hardlinked to the data of the 1st directory.  Within this
"orig data" directory, I have about 30.000 files, so the amount of files
to sync is approx. 2.100.000.  The overall size is about 9.2GB.

The method to synchronize is to have a "rsync --daemon" running on the
server in production and pull the data into the backup server via rsync::.

I secured this mechanism via a separate 100mbit network link that is
provided exclusively for the task.  The systems are

- linux-2.4.16 
- glibc-2.2
- i686 (Coppermine with 900MHz) with 512MB RAM and 400MB Swap on the
    main server and 128MB Swap on the backup server (I know this is stupid
    but at the moment I can't help it)

What happens?

The synchronization starts and gobbles up approx. 300MB of RAM/Swap by
calculating the file list at the server.  At the client system, approx.
620MB (aka. nearly all) memory is allocated to compare the file list (the
sync is set up with -auvH).  The files are transfered - when running it 
the 1st time, all files are transfered of course - and the transfer stops
at the client after an hour with

  rsync.c:sig_int() called.
  rsync error: received SIGUSR1 or SIGINT (code 20) at rsync.c(230)
  rsync error: received SIGUSR1 or SIGINT (code 20) at main.c(741)

where I do not see *anything* that is interfering (not me either).

OK I say, better luck next time.  However, as no rsync process remains
at the client (backup server) side, the "rsync --daemon" at the main
server did reduce its memory usage over the file transfer but after
the client broke off communications still has a child hanging around 
with 200MB of mem in use!

So, when running rsync the next time, I will have 500MB of memory eaten
up by both rsyncs on the main server (the new and the old) which is quite 
a lot.  Unfortunately, the second (and third) tries to sync break after
some time with similar messages as shown above and the hanging processes
at the main server will be happy with 700-800MB mem at their hands.

The result?  The production server is dying a slow and painful out-of-
mem-death when I don't do a "killall -9 rsync" after some time....

Any comments how to debug this?

I just have the idea that maybe the kernel at client side is sending 
silently a signal to the rsync process due to excess memory usage?

How to avoid that behaviour? (the client system was quite happy as it 
does nothing else than rsyncing...)

Regards,

- Birger