[Bug 3186] New: Surprisingly large unshared memory usage

samba-bugs at samba.org samba-bugs at samba.org
Tue Oct 18 02:57:39 GMT 2005


           Summary: Surprisingly large unshared memory usage
           Product: rsync
           Version: 2.6.7
          Platform: x86
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: core
        AssignedTo: wayned at samba.org
        ReportedBy: foner-rsync-bugzilla at media.mit.edu
         QAContact: rsync-qa at samba.org
                CC: foner-rsync-bugzilla at media.mit.edu

I'm running a command like "rsync -vrltH --delete -pgo --stats -z -D
--numeric-ids -i --link-dest=foo blah:bar baz" (part of a dirvish run) with an
input fileset of about 2.4 million files (400K of those file are actually
hardlinked to each other on the sending machine, and remain that way on the
receiving machine---and in fact all but about 30 of them haven't changed, so
virtually all 2.4M of those files also wind up hardlinked to the --link-dest
directory; this is about 280G total).

It takes about 10 minutes to scan a filesystem of this size, and both the
sending & receiving machines rsyncs slowly expand to about 200M during this
scan; that's understandable.  But then, as soon as the scan is done, the second
rsync process on the receiving side inflates (over the course of about 5 seconds
or so) to -another- 200M.  I don't think I'm being faked out by shared memory
being reported twice, since the free memory on the machine declines
precipitously at exactly the same time.  This isn't quite screwing me yet (the
machine's got half a gig of RAM and very little else that must stay resident
during the run), but if the filesystem gets much bigger, I fear massive
thrashing due to swapping.  (Really, what I'll have to do is buy more RAM.)

I was under the impression that this wasn't supposed to happen---that rsync
tried hard not to modify lots of pages after the fork, and that Linux (I'm
running Ubuntu Breezy, which has a 2.6 kernel) had copy-on-write fork semantics.
 Is the essentially instantaneous inflation of the second rsync process
happening because of either the -H or the --link-dest, or is it a bug?

[This transfer also accumulates about an hour of CPU time on this Athon 1200MHz
CPU; I assume this is due to the expense of -H, and works out to about 1.5
milliseconds of processing per file, assuming I haven't goofed on the math; this
is about a million instructions (or 21000 non-cached memory fetches) per file. 
I'd love it if this could be brought down, but I'm probably being unrealistic
about an essentially O(n^2) algorithm...]

Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

More information about the rsync mailing list