sync performance falls off a cliff

Carlos Carvalho carlos at fisica.ufpr.br
Sat Jul 4 19:00:47 GMT 2009


Leen Besselink (leen at consolejunky.net) wrote on 30 June 2009 09:05:
 >Mike Connell wrote:
 >> Hi,
 >>  
 >
 >Hi Mike,
 >
 >> I've got identical servers. One is primary the other is backup
 >> receiving rsyncs from the primary. I'm backing up a file system to
 >> disk and the files are small and there are lots of directories.
 >>  
 >> The overall problem seems to be the total number of files.
 >> When I had ~375,000 files, the total rsync time was under a minute.
 >> With ~425,000 files, the total rsync time is 10 minutes.
 >>  
 >> Last Friday when we were at 425,000 files, the rsync time was 10 minutes.
 >> Today I was able to delete 50,000 unneeded files and the rsync time went
 >> back down to under a minute.
 >>  
 >> So why the huge change in total rsync time for a somewhat small change
 >> in total number of files? I'm afraid that as the total number of files keeps
 >> increasing that the total rsync time is going to go exponential.
 >>  
 >> I turn the --progress flag on, and the time is rougly divided up evenly
 >> between
 >> building the file list and looking thru the file list. The files themselves
 >> are really small (~16K) and I'm not seeing any problem with anything
 >> other than how long it takes rsync to make a pass thru all the files. I
 >> do use
 >> the --delete option.
 >>  
 >> The servers are Dell 2950s, builtin RAID 10 disks and 4Gig of RAM.
 >> OS is Centos 5.1. I'm running rsync 2.6.8 protocol version 29.
 >>  
 >> This smells to me like some sort of caching problem. Is there something
 >> in the kernel or rsync itself that I can tweek?
 >>
 >
 >I'm no expert, but I suggest using rsync 3.x (3.0.6 for example), it
 >doesn't keep the as much information of the filelist in memory.

Yes. Or at lease it starts transfers much faster, because it doesn't
wait for the full list to be completed.

 >It's probably swapping to disk, because of the large list and that
 >significantly slows down the performance of the whole machine(s).

He's probably running out of ram, not only because of rsync but
also everything else. Since inodes and files are not in ram, they have
to be fetched from the disk, which is *very* slow.

You can tell the kernel to increase the priority of inodes, which will
reduce the time to build the file list a lot. Just set
/proc/sys/vm/vfs_cache_pressure to a low value.


More information about the rsync mailing list