[Bug 5124] Parallelize the rsync run using multiple threads and/or connections
Matthias Schniedermeyer
ms at citd.de
Tue Jan 28 06:25:51 MST 2014
On 28.01.2014 04:26, L.A. Walsh wrote:
>
>
> Matthias Schniedermeyer wrote:
> >On 25.01.2014 21:03, L.A. Walsh wrote:
> >>If rsync already hitting over 200MB (even over 100MB)/s I'd agree that using
> >>multiple TCP connections would help as they could be processed by different
> >>cpu's on both ends. But since it doesn't even hit 100MB/s locally, the
> >>limit *isn't* TCP connections.
> >
> >Just FYI.
> >Rsync 3.1 got much better in that regard.
> >When i rsync a file locally from tmpfs to tmpfs rsync does that
> >with about 560MB/s. Rsync 3.0.X managed to do less than half of
> >that.
> ---
>
> >rsync --version
> rsync version 3.1.0 protocol version 31
>
> To run an rsync and compare SRC with a DST specified by --compare-dest,
> to an empty partition (i.e. just copy differences between SRC + cmp-dest
> to the empty partition, on a 1TB partition takes 45-90 minutes.
> Usually thats:
> Home-2014.01.12-03.07.03 HnS -wi-ao--- 1.04g
> Home-2014.01.14-03.07.07 HnS -wi-ao--- 2.05g
> Home-2014.01.16-03.07.02 HnS -wi-ao--- 1.42g
> Home-2014.01.18-03.07.03 HnS -wi-ao--- 1.26g
> Home-2014.01.20-03.07.03 HnS -wi-ao--- 2.30g
> Home-2014.01.21-03.07.03 HnS -wi-ao--- 2.96g
> Home-2014.01.22-03.07.03 HnS -wi-ao--- 1.57g
> Home-2014.01.23-03.07.03 HnS -wi-ao--- 1.80g
>
>
> 1-3g in length.
>
> 3g/45minutes = 3g/2700s => 1.38MB/s -- not even close to 100MB.
>
> It has to read time & date stamps of quite a few files, but my
> best local speed has been under 100MB.
>
> What size files are you transferring and how many? (average).
>
> My times are for my home partition, (in case that wasn't
> obvious from the partition names above... ;-))...
>
> It has 4,986,955 files in 716132824K (~683M).
> Lots of seeks, very little full speed reads (up to ~1TB/s raid)
>
>
> So.. what size files and how much info are you transferring and
> to/from what type of disks?
For the number i used in the email i transferred a 10GB file from a
tmpfs to the same tmpfs.
> cd /tmpfs
> dd if=/dev/zero of=zero bs=1M count=10k
> rsync -avP zero zero.2
sending incremental file list
zero
10,737,418,240 100% 562.67MB/s 0:00:18 (xfr#1, to-chk=0/1)
sent 10,740,039,772 bytes received 35 bytes 580,542,692.27 bytes/sec
total size is 10,737,418,240 speedup is 1.00
As i personally don't have any hardware that can sustain such a
perfomance i can really only compare "dry" performance in a tmpfs,
against the older rsync.
On my real hardware i only have SSDs, non-RAID HDDs and
Gigabit-Ethernet, neither of those can sustain the bandwidth rsync
delivers, so my copy operations are mostly using all available
bandwidth.
Or are seek/latency limited when i transfer many small files. Even the
SSDs i own aren't that great in that regard (good at the read part, not
that good at the random-write part).
That you don't get good performance (copying/synchronising) nearly 5
million files doesn't surprise me at all, HDDs are really bad for that.
Even high performance RAIDs with many spindles only reduce that problem
but can't eliminate it.
I remember a whitepaper Intel had released a few years ago where they
compared SSDs performance against a high-performance RAID (16 or 24 15k
RPM HDDs IIRC) in an Exchange-Email scenario, so about as random as it
gets. AFAIR just 1 SSD had 80% of the seek performance than the entire
high performance RAID system they benchmarked it against. And the
benchmark used 3 SSDs, so it was like something about 200% better
performance for the 3 SSDs vs. the high performance RAID.
Copying ginourmous amounts of small files is pratically not much
different than random seeking (what the Exchange benchmark was about).
What works against you is that rsync processes the file-list in
alphabetically order, if the files would be processed in "creation
order"(*), the performance MIGHT be better as that would reduce seeks,
but AFAIK there is really no way to do that with rsync. (Keeping track
of which files and what order the files where created by inotify and
doing a separated copy of new files in "creation order", might be a way
of speeding up that part. And than an additional rsync-run for the rest)
In conclusion: Sorry, i can't really help you with your problem.
*:
But that heavily depends on workload.
And you would need a way to determine that, either CRTIME or inode.
CRTIME isn't supported by all fileystems, inode is usually not
GUARANTEED to be deterministic, altough it might be.
With that said, that could work in a scenario where files are created
with full contents and not changed after that. If file are changed over
time that would only help for the newly created file, but not for
changed files.
--
Matthias
More information about the rsync
mailing list