High memory usage - any way around it other than splitting jobs?
Kevin Korb
kmk at sanitarium.net
Thu Jun 25 14:53:23 UTC 2020
Unfortunately the hard links are the problem. In order to keep them
straight rsync has to remember the details of every file it finds with a
link count >1 making it grow and grow. Of course without -H rsync will
end up duplicating them.
On 6/25/20 10:30 AM, Andy Smith via rsync wrote:
> Hi,
>
> I have a virtual machine with 2G of memory. On this VM there is a
> directory tree with 33.3 million files in it. When attempting to
> rsync (rsync -PSHav --delete /source /dest) this tree from one
> directory to another on the same host, rsync uses all the memory and
> is killed by oom-killer.
>
> This host is Debian oldstable so has
>
> $ rsync --version
> rsync version 3.1.2 protocol version 31
>
> The normal operation of this VM does not require more than 2G of
> memory, but I doubled it to 4G anyway. Unfortunately rsync still
> uses all the memory and is killed.
>
> Most advice I can find on decreasing rsync memory usage advises to
> split the job up into batches. By issuing one rsync for each
> directory within /source I was able to make this work.
>
> The interesting thing is though, the split of file numbers between
> sub-directories is very uneven with the majority of them (31.5
> million of the 33.3 million) being in just one of the sub-directory
> trees. I am kind of surprised that rsync has such a problem going
> just that little bit further with the last 2 million. Is there any
> scope for improvement with the incremental recursion code?
>
> If I upgraded the version of rsync could I expect this to work any
> better?
>
> I could also give the host a massive swap file. It currently has
> just 1G of swap, which all gets used in the failure case. I could
> add more but I fear that the job will go so slow it will not
> complete in a reasonable time.
>
> I don't know if the -H option is causing extra memory usage here;
> unfortunately it is necessary as there are hardlinks in there.
>
> Some years old advice says to disable incremental recursion with
> --no-i-r. As incremental recursion was added to reduce memory usage
> this seems counter-intuitive to me, but this advice is all over the
> Internet…
>
> These are all things I will investigate before settling for the
> "split into multiple jobs" approach; just wondered if anyone has any
> shortcuts for me.
>
> Thanks,
> Andy
>
--
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
Kevin Korb Phone: (407) 252-6853
Systems Administrator Internet:
FutureQuest, Inc. Kevin at FutureQuest.net (work)
Orlando, Florida kmk at sanitarium.net (personal)
Web page: https://sanitarium.net/
PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/rsync/attachments/20200625/7626a0ac/signature.sig>
More information about the rsync
mailing list