Other possible solutions to: rsync memory usage, paid feature request

Paul Slootman paul at debian.org
Thu Jul 7 07:44:37 GMT 2005


On Wed 06 Jul 2005, David Favro wrote:
> 
> 1) Free: break your rsync's into several executions rather than one huge
> one.  Do several sub-directory trees, each separately.  If your data
[...]
> 2) Cheap: buy more swap space.  These days random-access magnetic
[...]
> 4) Expensive: buy more solid-state memory.  Possibly still cheaper than
[...]

None of these proposals would have helped when I wanten to move two
year's worth of Debian archive images to another system using rsync.
The Debian archive is currently around 88.000 files (at least what we
mirror of it). Every day a snapshot is taken; common files are
hardlinked across days. This means an incredible amount of directory
entries and hunderds of thousands of distinct files.

Doing 1) was not feasible, as that would result in very many hardlinks
being lost and files effectively duplicated, leading to wasted space.

Doing 2) was tried (actually: creating swap files on disk), but then we
ran into the virtual address limitations of the 32-bit system: 3GB
wasn't enough by far.

Doing 3) would have the same problem as 2).  Going to a 64-bit system
might have helped, but I think that the memory usage would have exceeded
what's reasonable in solid-state memory, and using swap would have
slowed it all down horribly as the lists in memory are apparently
transversed quite regularly. As it is, it took a couple of days before
the virtual memory limit was reached...


I ended up rsyncing the days separately, and using a perl program to
build a tree of md5sums which were hardlinks to the corresponding files.
With each new directory the md5sums could be compared and hardlinks
recreated.  However, I would *love* to see rsync be more
memory-efficient...


Paul Slootman


More information about the rsync mailing list