High memory usage - any way around it other than splitting jobs?

Andy Smith andy at strugglers.net
Thu Jun 25 14:30:11 UTC 2020


Hi,

I have a virtual machine with 2G of memory. On this VM there is a
directory tree with 33.3 million files in it. When attempting to
rsync (rsync -PSHav --delete /source /dest) this tree from one
directory to another on the same host, rsync uses all the memory and
is killed by oom-killer.

This host is Debian oldstable so has

$ rsync --version
rsync  version 3.1.2  protocol version 31

The normal operation of this VM does not require more than 2G of
memory, but I doubled it to 4G anyway. Unfortunately rsync still
uses all the memory and is killed.

Most advice I can find on decreasing rsync memory usage advises to
split the job up into batches. By issuing one rsync for each
directory within /source I was able to make this work.

The interesting thing is though, the split of file numbers between
sub-directories is very uneven with the majority of them (31.5
million of the 33.3 million) being in just one of the sub-directory
trees. I am kind of surprised that rsync has such a problem going
just that little bit further with the last 2 million. Is there any
scope for improvement with the incremental recursion code?

If I upgraded the version of rsync could I expect this to work any
better?

I could also give the host a massive swap file. It currently has
just 1G of swap, which all gets used in the failure case. I could
add more but I fear that the job will go so slow it will not
complete in a reasonable time.

I don't know if the -H option is causing extra memory usage here;
unfortunately it is necessary as there are hardlinks in there.

Some years old advice says to disable incremental recursion with
--no-i-r. As incremental recursion was added to reduce memory usage
this seems counter-intuitive to me, but this advice is all over the
Internet…

These are all things I will investigate before settling for the
"split into multiple jobs" approach; just wondered if anyone has any
shortcuts for me.

Thanks,
Andy



More information about the rsync mailing list