Odd behavior

Erich Weiler weiler at soe.ucsc.edu
Thu Apr 22 13:43:29 MDT 2010


Hi Y'all,

I'm seeing some interesting behavior that I was hoping someone could 
shed some light on.  Basically I'm trying to rsync a lot of files, in a 
series of about 60 rsyncs, from one server to another.  There are about 
160 million files.  I'm running 3 rsyncs concurrently to increase the 
speed, and as each one finishes, another starts, until all 60 are done.

The machine I'm initiating the rsyncs on has 48GB RAM.  This is CentOS 
linux 5.4, kernel revision 2.6.18-164.15.1.el5.  Rsync version 3.0.5 (on 
both sides).

I was able to rsync all the data over to the new machine.  But, because 
there was so much data, I need to run the rsyncs again to catch data 
that changed during the last rsync run.  It sort of hangs midway through.

What happens is that as the rsyncs run, the memory usage on the machine 
slowly creeps up, using quite a bit of RAM, which is odd because I 
thought the rsyncs were counting files incrementally, to reduce RAM 
impact.  But, looking at top, the rsync processes aren't using much RAM 
at all:

top - 12:22:10 up 1 day, 27 min,  1 user,  load average: 46.85, 46.37, 44.97
Tasks: 309 total,   8 running, 301 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0%us, 13.8%sy,  0.0%ni, 84.9%id,  0.0%wa,  0.0%hi,  0.3%si, 
0.0%st
Mem:  49435196k total, 34842524k used, 14592672k free,   141748k buffers
Swap: 10241428k total,        0k used, 10241428k free,    49428k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 

  7351 root      25   0 19892 9.8m  844 R 100.1  0.0 552:58.55 rsync 

  9084 root      16   0 13108 2904  820 R 100.1  0.0 299:24.59 rsync 

  4759 root       0 -20 1447m  94m  15m S 29.9  0.2 667:34.21 mmfsd 

  9539 root      16   0 30136  19m  820 R  6.3  0.0   6:29.28 rsync 

  9540 root      15   0  271m  46m  260 S  0.3  0.1   0:12.13 rsync 

10047 root      15   0 10992 1212  768 R  0.3  0.0   0:00.01 top 

     1 root      15   0 10348  700  592 S  0.0  0.0   0:02.15 init
...etc...

But nevertheless, 34GB RAM is in use.  But what really kills things is 
that at some point, each rsync all of a sudden ramps up to 100% CPU 
usage, and the all activity for that rsync essentially stops.   In the 
above example, 2 of the 3 rsyncs are in that 100% CPU state, while the 
third rsync is only at 6.3%, but that is the one actually doing 
something.  In some cases all 3 rsyncs get to 100%, and they all stall, 
there is no network traffic on the NIC at all and they don't progress.

Now mostly what they are doing is counting files, since most of the 
files are the same on both sides, but there are just so many files (160 
million).  I don't seem to be out of memory, but I don't know why rsync 
would go to 100% CPU and just stall.

I am rsyncing from an rsync server to my local server, with commands 
similar to this:

rsync -a --delete rsync://encodek-0-4/data/genomes/ /hive/data/genomes/

Again, both sides at version 3.0.5.  Nothing fancy or special.  I have 
confirmed that it does count the files incrementally by running a few 
manually, it does report "getting incremental file list...".

Any ideas why the processes go to 100% CPU and then stall?  I should 
also note that the initial run of rsyncs, where it was actually copying 
a ton of data, did not seem to have this problem, but now that the data 
is there and I'm rsyncing again, it seems to have this problem.  Is it 
somehow related to the fact that it is mostly comparing a ton of files 
very quickly but not actually copying many of them?

Thanks for any ideas!

-erich


More information about the rsync mailing list