Large RAM (> 4G) and rsync still dies?
jw at pegasys.ws
Fri Nov 21 00:40:05 EST 2003
On Thu, Nov 20, 2003 at 06:24:27AM -0000, rsync at med-web.com wrote:
> Hello. Hopefully someone can shed some light on this:
> We've got a production server with _LOTS_ of files on it. The system is a dual
> XEON with 4GB of RAM. During the evening, the load is very low. Linux shows
> (via 'top') that approximately 3G of RAM is cached RAM, and 500M is buffered.
> That, if my understanding is correct, should leave us with 3.5G of available
> memory to draw upon.
> Running rsync with --progress, (v2.5.6 I believe - just downloaded it a few
> days ago) shows us that we reach a little over 8 million files before the
> server starts telling us its killing processes because its out of RAM.
> On the FAQ, it says that rsync should consume approximately 100bytes of memory
> per file, on average. So, 8 million x 100 = 800M of RAM.
> Why are we running out of RAM? Is there a way to tell the kernel not to use so
> much memory for cache and buffers, and to leave more free? Is the kernel not
> releasing the cache/buffer memory quick enough for rsync? I don't know,
> otherwise I wouldn't be here asking these questions. =)
This is a kernel question so is a little OT for rsync.
The issue in this case is address space mapping.
By default user processes have only so much address space
available. If you look at /proc/$PID/maps you will see how
it is mapped. At low memory will be the executable, above
that is data, bss, anonymous data and brk (see the brk(2)
manpage) growing upward. In middle memory starting usually
at 0x40000000 are the mmap areas where libraries are
dynamically linked and files mmaped. Above that growing
downward from 0xc0000000 (depending on kernel options) is
the user-mode stack.
Rsync uses malloc and realloc to allocate memory. The file
list requires about 70 bytes per file in a contigious block
of 1000 files and grows by doubling. Observe the "growing
file list" messages. If it "did move" that means it had to
allocate the new size in a contigious block and then copy
the file list from the old list so it needed 1.5 times the
new size or 3 times the previous size to hold previous size
+ 1 files.
Malloc will either get the memory it needs by moving brk
(brk(2)) or by doing an annonymous map. Different versions
of malloc chose differently. In either case it needs to
allocate a contigious address range without becoming
contigious with any existing memory maps and still leaving
room for stack growth. In practice this limits the
available space to around 1GB.
There are kernel build options that change the location of
the user stack to adjust the division between kernel and
user space (all within the 4GB range). On smaller systems
the kernel can be coaxed into giving up another .5GB range
of addresses but if you have 4GB or more it might be better
for system performance to actually move the other direction,
shrinking the available range for processes.
For more on this there are good books on kernel internals
and the kernel-newbies site.
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync