weiler at soe.ucsc.edu
Thu Apr 22 13:43:29 MDT 2010
I'm seeing some interesting behavior that I was hoping someone could
shed some light on. Basically I'm trying to rsync a lot of files, in a
series of about 60 rsyncs, from one server to another. There are about
160 million files. I'm running 3 rsyncs concurrently to increase the
speed, and as each one finishes, another starts, until all 60 are done.
The machine I'm initiating the rsyncs on has 48GB RAM. This is CentOS
linux 5.4, kernel revision 2.6.18-164.15.1.el5. Rsync version 3.0.5 (on
I was able to rsync all the data over to the new machine. But, because
there was so much data, I need to run the rsyncs again to catch data
that changed during the last rsync run. It sort of hangs midway through.
What happens is that as the rsyncs run, the memory usage on the machine
slowly creeps up, using quite a bit of RAM, which is odd because I
thought the rsyncs were counting files incrementally, to reduce RAM
impact. But, looking at top, the rsync processes aren't using much RAM
top - 12:22:10 up 1 day, 27 min, 1 user, load average: 46.85, 46.37, 44.97
Tasks: 309 total, 8 running, 301 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.0%us, 13.8%sy, 0.0%ni, 84.9%id, 0.0%wa, 0.0%hi, 0.3%si,
Mem: 49435196k total, 34842524k used, 14592672k free, 141748k buffers
Swap: 10241428k total, 0k used, 10241428k free, 49428k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7351 root 25 0 19892 9.8m 844 R 100.1 0.0 552:58.55 rsync
9084 root 16 0 13108 2904 820 R 100.1 0.0 299:24.59 rsync
4759 root 0 -20 1447m 94m 15m S 29.9 0.2 667:34.21 mmfsd
9539 root 16 0 30136 19m 820 R 6.3 0.0 6:29.28 rsync
9540 root 15 0 271m 46m 260 S 0.3 0.1 0:12.13 rsync
10047 root 15 0 10992 1212 768 R 0.3 0.0 0:00.01 top
1 root 15 0 10348 700 592 S 0.0 0.0 0:02.15 init
But nevertheless, 34GB RAM is in use. But what really kills things is
that at some point, each rsync all of a sudden ramps up to 100% CPU
usage, and the all activity for that rsync essentially stops. In the
above example, 2 of the 3 rsyncs are in that 100% CPU state, while the
third rsync is only at 6.3%, but that is the one actually doing
something. In some cases all 3 rsyncs get to 100%, and they all stall,
there is no network traffic on the NIC at all and they don't progress.
Now mostly what they are doing is counting files, since most of the
files are the same on both sides, but there are just so many files (160
million). I don't seem to be out of memory, but I don't know why rsync
would go to 100% CPU and just stall.
I am rsyncing from an rsync server to my local server, with commands
similar to this:
rsync -a --delete rsync://encodek-0-4/data/genomes/ /hive/data/genomes/
Again, both sides at version 3.0.5. Nothing fancy or special. I have
confirmed that it does count the files incrementally by running a few
manually, it does report "getting incremental file list...".
Any ideas why the processes go to 100% CPU and then stall? I should
also note that the initial run of rsyncs, where it was actually copying
a ton of data, did not seem to have this problem, but now that the data
is there and I'm rsyncing again, it seems to have this problem. Is it
somehow related to the fact that it is mostly comparing a ton of files
very quickly but not actually copying many of them?
Thanks for any ideas!
More information about the rsync