ziegast at vix.com
Fri May 17 12:15:02 EST 2002
> In my humble opinion, this problem with rsync growing a huge memory
> footprint when large numbers of files are involved should be #1 on
> the list of things to fix.
I think many would agree. If it were trivial, it'd probably be
done by now.
Fix #1 (what most people do):
Split the files/paths to limit the size of each job.
What someone could/should do here is at least edit the
"BUGS" section of the manual to talk about the memory
Fix #2 (IMHO, what should be done to rsync):
File caching of results (or using a file-based database of
some sorts) is the way to go. Instead of maintaining a
data structure entirely in memory, open a (g)dbm file or add
hooks into the db(3) libraries to store file metadata and
It'll be slower than an all-memory implementation, but large
jobs will at least finish predictably.
Fix #3 (what I did):
If you really really need to efficiently transfer large
numbers of files, come up with your own custom process.
I used to run a large web site with thousands of files and
directories that needed to be distributed to dozens of
servers atomically. Using rsync, I'd run into memory
problems and worked around them with Fix #1. Another
problem was running rsync in parallel. The source directory
was scanned order(N) times when it needed to be scaned only
once. The source content server was pummeled from the
multiple simultaneous instances. I resorted to making my
own single-threaded rsync-like program in Perl to behave
more like Fix #2 and runs very efficiently.
I've spent a some time cleaning up this program so that
I can publish it, but priorities (*) are getting in the
way. When I get some time, you'll see it posted here.
(*) Looking for a full-time job is a full-time job. :^(
Will consult for food.
More information about the rsync