Rsync woes with large numbers of files
stuart at reeltwo.com
Mon Jul 1 13:35:01 EST 2002
I recently read a thread about the problems people are having with file
systems with a large number of files on them. We have a 80GB file system
with ~10 million files on it. Rsync runs out of memory on a 512M RAM
machine while (I assume) reading in the list of files to send.
To avoid this problem we process each of the 40 top level directories
one at a time (in a "for f in *" type loop) and this almost solves our
problems. Some of the top level directories themselves are too large, so
we need to put another for loop inside that directory to try and stop
rsync running out of memory.
My question is this: does rsync need to stat every single file in the
filesystem before it tries to sync a collection of files? Can it simply
keep a reasonable number of files on both sides of the connection (say,
1-10,000 files?) and transfer them in batches?
On our server this takes >24 hours to sync the machine, this isn't so
bad, but it'd be good to make rsync a little more robust so that if we
create a whole bunch of files we know the rsync will work, currently we
have to rely on logs and update the script to keep things being backed
PS: I'm using Linux, does anyone know what filesystems support file
logging, so that I can simply rsync based a file that contains a daily
list of changed files?
More information about the rsync