Other possible solutions to: rsync memory usage, paid feature request

David Favro lists.samba.org at favro.org
Wed Jul 6 19:26:12 GMT 2005

Hi, Matthew --

Regarding your message of 05-Jul-2005 concerning rsync memory usage
(sorry that I am not directly replying to it; I am not as yet subscribed
to the list and my mailer doesn't allow me to hard-code an In-Reply-To
or References header):

While I applaud anyone who wants to encourage open-source development,
it seems to me that, if in fact your problem is that you are running out
of memory due to transferring too many files at a time, there are much
cheaper solutions for your company than paying for the change to rsync
described in the FAQ.

1) Free: break your rsync's into several executions rather than one huge
one.  Do several sub-directory trees, each separately.  If your data
files are not organized in such a way that they can easily be divided
into a reasonable number of sub-directory trees, consider re-organizing
them so that they can be: it will pay off in many other sys-admin
benefits as well.

2) Cheap: buy more swap space.  These days random-access magnetic
storage is running close to 0.50 USD per gig (e.g. here:
http://www.buy.com/retail/product.asp?sku=10360313 is 200GB for $105 in
the US, including shipping).  At the stated rate of 100 bytes per file,
this is enough storage to add 2 billion files to each rsync that you
run, for a price that is less than many programmers want for a week of
coding.  If you have much more than 2 billion files in each sub-
directory tree, you are probably doing something very wrong. :-)

3) Free: If your problem is not that you are running *out* of memory but
rather that rsync is (temporarily) 'stealing' the core (solid-state)
memory from the other 'more important' (i.e. requiring quicker response
time) processes (causing their data to get swapped out, which might
reduce response-time when that data later needs to get swapped back in),
you might also consider using the operating system to either lock-down
the memory used by your important server programs so that it cannot be
swapped out, or give them higher priority (memory-priority, not CPU-
scheduling priority, though that might be a good idea also) in such a
way that rsync gets swapped out before they do, and it will then
maintain a small footprint in physical memory (I am not sure if this is
possible or how to do it under Linux, but would be interested to know --
a sort of variant of the nice command but for core usage, or a per-
process maximum-in-core parameter).  I would however use some caution
when doing either of these since the general-purpose VM swap-out
algorithms used by most modern operating systems usually do a pretty
good job of getting everything serviced in a reasonable response time:
forcing rsync to thrash the swap-cache as it might do if the lists are
traversed as often as the FAQ implies will not necessarily increase
overall performance of the system.  Solution (1) above will also greatly
improve this situation.  Otherwise, the final suggestion:

4) Expensive: buy more solid-state memory.  Possibly still cheaper than
paying for coding, but at any rate, in my experience, more core is
rarely the best solution for lack-of-core problems.

Another thought to consider: the method for the proposed "week of
coding" solution isn't specified, but it may well involve spooling the
lists to temporary files, in which case you'll probably need to buy the
storage from solution number (2) above anyhow, in addition to paying for
the coding, and get what amounts to nearly the same solution as (2)
and/or (3) anyhow.  That said, I am always in favor of frugal use of
core -- it all depends on what the proposed solution is; if it involves
substituting user-space 'swapping' to disk rather than kernel-space
swapping, it's likely not worth  (apparently large) effort, which could
be better directed at other improvements, especially considering the
likely decrease in the cost of solid-state memory in the future.

Finally, trying to first experiment with solutions (1) and (2) above may
help you to determine if indeed the problem is what you think it is,
before you shell out for a software solution.

Also keep in mind that in my experience, when most programmers estimate
"1 week of coding" it often ends up taking 2 or 3, or sometimes 8.

Just my (rambling) thoughts as a fellow programmer and system

Anyhow, I really admire someone who is willing to shell out for
improvements to open-source code!

David Favro
Senior Partner
Meta-Dynamic Solutions

More information about the rsync mailing list