Parallelizing rsync through multiple ssh connections

Robin H. Johnson robbat2 at gentoo.org
Thu Dec 16 21:32:24 UTC 2021


On Thu, Nov 04, 2021 at 04:58:03PM +0100, SERVANT Cyril via rsync wrote:
> Hi, I want to increase the speed of rsync transfers over ssh.
Thanks for your great email here.

Having had similar issues in the past in trying to rsync single large
files, I wanted to share some of the ideas I'd found to work:

HPN-SSH patches. The website is out of date, but don't let that put
you off. HPN-SSH can saturate 40Gbit links with tuning (but it's
absolutely work to do that tuning). The main things there are the buffer
patches, and the multithreaded AES, but you can use the NONE encryption
for benchmarking as well.

Intel had a paper from 2010 showing the HPN boost (and also other work
on multi streams):
https://www.intel.com/content/dam/support/us/en/documents/network/sb/fedexcasestudyfinal.pdf

Facebook's WARP/WDT tooling:
https://github.com/facebookarchive/wdt
https://opensourcelibs.com/lib/warp-cli

Lastly, I was trying multipath TCP:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/getting-started-with-multipath-tcp_configuring-and-managing-networking
I didn't get very far on the MPTCP research angle.

I think all of these are likely to be complementary to your work on
partitioning the large file.

If you have a sample large file and permission to test without
encryption, temporarily replacing ssh w/ either the NONE cipher or
trying to use buffer-tuned netcat would let you identify what the
bottleneck of rsync is in your situation. I found previously that it
didn't do a good job on the rsync:// wire protocol over high-latency: it
had too many round trips and didn't do much work between them.

I think from looking at the rsync code in the past, the checksum system
in general is going to be your largest problem.
- it assumes that it's checking a single stream for each file
- meaningful replacement would be either independent per-segment
  checksums or something like a merkle tree
> 1. The need
> 
> TL;DR: we need to transfer one huge file quickly (100Gb/s) through ssh.
...
> In order to maximize transfers speed, I first tried different Ciphers / MACs.
> The list of authorized Ciphers / MACs is provided to me by our security team.
> With these constraints, I can reach 1Gb/s to 3Gb/s. I'm still far from the
> expected result. This is due to the way encryption/decryption work on modern
> CPUs: they are really efficient thanks to AES-NI, but are single-threaded. The
> bandwidth limiter is the speed of a single CPU core.
HPN-SSH MT-AES here gets you to many cores at the SSH level.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robbat2 at gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 1113 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/rsync/attachments/20211216/e65046ae/signature.sig>


More information about the rsync mailing list