sync performance falls off a cliff
leen at consolejunky.net
Tue Jun 30 07:05:58 GMT 2009
Mike Connell wrote:
> I've got identical servers. One is primary the other is backup
> receiving rsyncs from the primary. I'm backing up a file system to
> disk and the files are small and there are lots of directories.
> The overall problem seems to be the total number of files.
> When I had ~375,000 files, the total rsync time was under a minute.
> With ~425,000 files, the total rsync time is 10 minutes.
> Last Friday when we were at 425,000 files, the rsync time was 10 minutes.
> Today I was able to delete 50,000 unneeded files and the rsync time went
> back down to under a minute.
> So why the huge change in total rsync time for a somewhat small change
> in total number of files? I'm afraid that as the total number of files keeps
> increasing that the total rsync time is going to go exponential.
> I turn the --progress flag on, and the time is rougly divided up evenly
> building the file list and looking thru the file list. The files themselves
> are really small (~16K) and I'm not seeing any problem with anything
> other than how long it takes rsync to make a pass thru all the files. I
> do use
> the --delete option.
> The servers are Dell 2950s, builtin RAID 10 disks and 4Gig of RAM.
> OS is Centos 5.1. I'm running rsync 2.6.8 protocol version 29.
> This smells to me like some sort of caching problem. Is there something
> in the kernel or rsync itself that I can tweek?
I'm no expert, but I suggest using rsync 3.x (3.0.6 for example), it
doesn't keep the as much information of the filelist in memory.
It's probably swapping to disk, because of the large list and that
significantly slows down the performance of the whole machine(s).
Have a look at the output of the 'vmstat 2' command on both machines
while it's busy, specifically look at the caption that says 'swap',
it has a 'si' and 'so' column below it. 'si' means reading from
swap/disk and 'so' means writing to swap/disk.
You can try it out fairly easily, especially if you don't use rsync
for anything else. If you can't find a package, just building it is
possible an option:
tar -zxvf rsync-3.0.6.tar.gz
nice ./configure && nice make
That should work (atleast if you have gcc and make and possible other
things already installed).
And instead of calling rsync, you call /usr/src/rsync-3.0.6/rsync if
you just want te test it first without installing.
You'll have to do it on both machines ofcourse. If you are not sure
you want to make any changes, with an unsupported binary, you can
use: -n that would make rsync not write changes to disk.
Hope these instructions help.
More information about the rsync