[Bug 5124] Parallelize the rsync run using multiple threads and/or connections

Matthias Schniedermeyer ms at citd.de
Tue Jan 28 06:25:51 MST 2014


On 28.01.2014 04:26, L.A. Walsh wrote:
> 
> 
> Matthias Schniedermeyer wrote:
> >On 25.01.2014 21:03, L.A. Walsh wrote:
> >>If rsync already hitting over 200MB (even over 100MB)/s I'd agree that using
> >>multiple TCP connections would help as they could be processed by different
> >>cpu's on both ends.  But since it doesn't even hit 100MB/s locally, the
> >>limit *isn't* TCP connections.
> >
> >Just FYI.
> >Rsync 3.1 got much better in that regard.
> >When i rsync a file locally from tmpfs to tmpfs rsync does that
> >with about 560MB/s. Rsync 3.0.X managed to do less than half of
> >that.
> ---
> 
> >rsync --version
> rsync  version 3.1.0  protocol version 31
> 
> To run an rsync and compare SRC with a DST specified by --compare-dest,
> to an empty partition (i.e. just copy differences between SRC + cmp-dest
> to the empty partition, on a 1TB partition takes 45-90 minutes.
> Usually thats:
>  Home-2014.01.12-03.07.03 HnS     -wi-ao---   1.04g
>   Home-2014.01.14-03.07.07 HnS     -wi-ao---   2.05g
>   Home-2014.01.16-03.07.02 HnS     -wi-ao---   1.42g
>   Home-2014.01.18-03.07.03 HnS     -wi-ao---   1.26g
>   Home-2014.01.20-03.07.03 HnS     -wi-ao---   2.30g
>   Home-2014.01.21-03.07.03 HnS     -wi-ao---   2.96g
>   Home-2014.01.22-03.07.03 HnS     -wi-ao---   1.57g
>   Home-2014.01.23-03.07.03 HnS     -wi-ao---   1.80g
> 
> 
> 1-3g in length.
> 
> 3g/45minutes = 3g/2700s => 1.38MB/s -- not even close to 100MB.
> 
> It has to read time & date stamps of quite a few files, but my
> best local speed has been under 100MB.
> 
> What size files are you transferring and how many?  (average).
> 
> My times are for my home partition, (in case that wasn't
> obvious from the partition names above... ;-))...
> 
> It has 4,986,955 files in 716132824K (~683M).
> Lots of seeks, very little full speed reads (up to ~1TB/s raid)
> 
> 
> So.. what size files and how much info are you transferring and
> to/from what type of disks?

For the number i used in the email i transferred a 10GB file from a 
tmpfs to the same tmpfs.

> cd /tmpfs
> dd if=/dev/zero of=zero bs=1M count=10k
> rsync -avP zero zero.2
sending incremental file list
zero
 10,737,418,240 100%  562.67MB/s    0:00:18 (xfr#1, to-chk=0/1)

sent 10,740,039,772 bytes  received 35 bytes  580,542,692.27 bytes/sec
total size is 10,737,418,240  speedup is 1.00


As i personally don't have any hardware that can sustain such a 
perfomance i can really only compare "dry" performance in a tmpfs, 
against the older rsync.

On my real hardware i only have SSDs, non-RAID HDDs and 
Gigabit-Ethernet, neither of those can sustain the bandwidth rsync 
delivers, so my copy operations are mostly using all available 
bandwidth.
Or are seek/latency limited when i transfer many small files. Even the 
SSDs i own aren't that great in that regard (good at the read part, not 
that good at the random-write part).

That you don't get good performance (copying/synchronising) nearly 5 
million files doesn't surprise me at all, HDDs are really bad for that. 
Even high performance RAIDs with many spindles only reduce that problem 
but can't eliminate it.

I remember a whitepaper Intel had released a few years ago where they 
compared SSDs performance against a high-performance RAID (16 or 24 15k 
RPM HDDs IIRC) in an Exchange-Email scenario, so about as random as it 
gets. AFAIR just 1 SSD had 80% of the seek performance than the entire 
high performance RAID system they benchmarked it against. And the 
benchmark used 3 SSDs, so it was like something about 200% better 
performance for the 3 SSDs vs. the high performance RAID.
Copying ginourmous amounts of small files is pratically not much 
different than random seeking (what the Exchange benchmark was about).

What works against you is that rsync processes the file-list in 
alphabetically order, if the files would be processed in "creation 
order"(*), the performance MIGHT be better as that would reduce seeks, 
but AFAIK there is really no way to do that with rsync. (Keeping track 
of which files and what order the files where created by inotify and 
doing a separated copy of new files in "creation order", might be a way 
of speeding up that part. And than an additional rsync-run for the rest)


In conclusion: Sorry, i can't really help you with your problem.




*:
But that heavily depends on workload.
And you would need a way to determine that, either CRTIME or inode.
CRTIME isn't supported by all fileystems, inode is usually not 
GUARANTEED to be deterministic, altough it might be.

With that said, that could work in a scenario where files are created 
with full contents and not changed after that. If file are changed over 
time that would only help for the newly created file, but not for 
changed files.



-- 

Matthias


More information about the rsync mailing list