Why does one of there work and the other doesn't

David Bolen db3l at fitlinxx.com
Sat Dec 1 09:30:20 EST 2001


From: Randy Kramer [mailto:rhkramer at fast.net]

> I am not sure which end the 100 bytes per file applies to, and I guess
> that is the RAM memory footprint?.  Does rsync need 100 bytes for each
> file that might be transferred during a session (all files in the
> specified directory(ies)), or does it need only 100 bytes as it does one
> file at a time?

Yes, the ~100 bytes is in RAM - I think a key point though is that the
storage to hold the file list grows exponentially (doubling each
time), so if you have a lot of files in the worst case you can use
almost twice as much memory as needed.

Here's an analysis I posted to the list a while back that I think is
still probably valid for the current versions of rsync - a later followup
noted that it didn't include an ~28 byte structure for each entry in
the include/exclude list:

	  - - - - - - - - - - - - - - - - - - - - - - - - -

> (a) How much memory, in bytes/file, does rsync allocate?

This is only based on my informal code peeks in the past, so take it
with a grain of salt - I don't know if anyone has done a more formal
memory analysis.

I believe that the major driving factors in memory usage that I can
see is:

1. The per-file overhead in the filelist for each file in the system.
   The memory is kept for all files for the life of the rsync process.

   I believe this is 56 bytes per file (it's a file_list structure),
   but a critical point is that it is allocated initially for 1000
   files, but then grows exponentially (doubling).  So the space will
   grow as 1000, 2000, 4000, 8000 etc.. until it has enough room for
   the files necessary.  This means you might, worst case, have just
   about twice as much memory as necessary, but it reduces the
   reallocation calls quite a bit.  At ~56K per 1000 files, if you've
   got a file system with 10000 files in it, you'll allocate room for
   16000 and use up 896K.

   This growth pattern seems to occur on both sender and receiver of
   any given file list (e.g., I don't see a transfer of the total
   count over the wire used to optimize the allocation on the receiver).

2. The per-block overhead for the checksums for each file as it is 
   processed.  This memory exists only for the duration of one file.
   
   This is 32 bytes per file (a sum_buf) allocated as on memory chunk.
   This exists on the receiver as it is computed and transmitted, and
   on the sender as it receives it and uses it to match against the
   new file.

3. The match tables built to determine the delta between the original
   file and the new file.
  
   I haven't looked at closely at this section of code, but I believe
   we're basically talking about the hash table, which is going to be
   a one time (during rsync execution) 256K for the tag table and then
   8 (or maybe 6 if your compiler doesn't pad the target struct) bytes
   per block of the file being worked on, which only exists for the
   duration of the file.
   
   This only occurs on the sender.

There is also some fixed space for various things - I think the
largest of which is up to 256K for the buffer used to map files.

> (b) Is this the same for the rsyncs on both ends, or is there
>     some asymmetry there?

There's asymmetry.  Both sides need the memory to handle the lists of
files involved.  But while the receiver just constructs the checksums
and sends them, and then waits for instructions on how to build the
new file (either new data or pulling from the old file), the sender
also constructs the hash of those checksums to use while walking
through the new file.

So in general on any given transfer, I think the sender will end up
using a bit more memory.

> (c) Does it matter whether pushing or pulling?

Yes, inasmuch as the asymmetry is based on who is sending and who is
receiving a given file.  It doesn't matter who initiates the contact,
but the direction that the files are flowing.  This is due to the
algorithm (the sender is the component that has to construct the
mapping from the new file using portions of the old file as
transmitted by the receiver).

	  - - - - - - - - - - - - - - - - - - - - - - - - -


-- David

/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/




More information about the rsync mailing list