rsync slowness
Gregory Heytings
gregory at heytings.org
Mon Feb 6 17:14:27 UTC 2023
I got hit by that bug again, and this time I took the time to try to
create a minimal recipe. It turns out that the culprit was the -y flag,
which I had been using to save some bandwidth, but which doesn't work well
when there are "too many" files in a directory.
A simple reproducer (which does not exactly reproduce the issue described
below, but seems close enough):
#!/bin/bash
mkdir -p src
declare -i i=0 j=0
while :
do
echo $RANDOM$RANDOM$RANDOM > src/$(printf %09d $j).$RANDOM$RANDOM$RANDOM.foobarbaz.$RANDOM.$RANDOM.$RANDOM
let i++
let j+=100
(($i == 100000)) && break
done
cp -a src dst
rm -f dst/00[7-9]*
rsync -avy src/ dst/
After copying about 900 files, rsync will hang during a couple of minutes,
before copying another batch of about 900 files. During the next
iterations, the batches become smaller: about 200 files. After each
batch, rsync waits during a couple of minutes. I stopped the process
after two hours, and only 17500 files had been copied. By comparison,
rsync -av src/ dst/
only takes 2 seconds to copy the 30000 files.
I removed the -y option from my scripts, but perhaps how -y works in
directories with many files could be improved in one way or another?
>
> I've definitely not seen that. If you can produce a working example and
> tar it up for us to look at, that might be interesting/useful.
>
> Just to check, though: you do not have --checksum/-c on, right?
>
>> I finally take the time to report an rsync slowness pattern that I've
>> been seeing for years.
>>
>> Assuming:
>>
>> a directory with many (> 20K) files, for example a maildir, on the
>> sending side, and
>>
>> a partial copy of that directory on the receiving side, with "enough"
>> missing missing (say 5K),
>>
>> then the receiving side will do the following: it will take about one
>> minute to start transferring the missing files, and will apparently
>> hang about one minute every 200 files or so. After a few (about 10 or
>> 20) iterations, the receiving side apparently hangs.
>>
>> On the receiving side rsync is using 100% CPU, on the sending side not
>> more than a few percents. In case it matters, both sides are using
>> ext4 filesystems, and Debian GNU/Linux.
>>
>> I tried --msgs2stderr -M--msgs2stderr, but that does not print any
>> error message, I also tried to disable compression, to use
>> --whole-file, to use --no-inc-recursive, ..., to no avail.
>>
>> Any hints?
More information about the rsync
mailing list