[Bug 5124] Parallelize the rsync run using multiple threads and/or connections
samba-bugs at samba.org
samba-bugs at samba.org
Tue Jan 14 10:40:24 MST 2014
https://bugzilla.samba.org/show_bug.cgi?id=5124
--- Comment #4 from Haravikk <me at haravikk.com> 2014-01-14 17:40:22 UTC ---
I see this is quite old, and to be honest I'm not completely familiar with
rsync's implementation, but more rsync performance is of benefit to everyone so
I thought I'd chip in my thoughts.
While UDP would be a good option, it's a fairly complex one to implement as
you'd essentially be reinventing the wheel when it came to re-requesting
packets etc., though I believe there may be new libraries out there that could
help with this; many Bittorrent clients for example now use µTP (micro
Transport Protocol) which is basically just UDP with some failure tolerance,
though this would still require some form of SSL support for widespread
adoption.
Personally I don't think the number of TCP connections is the problem though as
a single connection should be capable of utilising all available bandwidth.
That said, one of the problems with TCP is the self-adjusting frame-size, so to
get the most out of a connection you really need to utilise it at a constant
rate, otherwise the window size will go down, this means any long pauses
waiting for the next chunk of the file-list can result in performance dropping
until the next file starts being sent.
An alternative fix for this problem is to do something similar to Google's SPDY
protocol for HTTP, which is to multiplex several TCP connections together.
Basically, rsync would add its own information to packets, allowing them to be
quickly routed to/from multiple threads at each end, while sending all packets
over a single connection. This means you can have file-list packets mixed in
with multiple packets from various different file being transferred etc.; TCP
will continue to ensure they arrive in the correct order etc., and all rsync
has to do is setup an appropriate number of threads for generating chunks of
the file-list, performing delta comparisons, and transferring files. Basically
you end up with one thread acting a message dispatch service for this single
connection, taking all messages received and sending them to appropriate worker
threads, and packing outgoing messages ready to send down the TCP connection;
the worker threads then perform file/folder comparison for different parts of
the sync operation.
Not that this latter option isn't still complex and a lot of work, but IMO it's
the best way to do things (if rsync isn't already), and it allows rsync to run
multiple file/folder comparisons simultaneously depending upon the hardware at
each end and the current speed of the sync operation.
--
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
More information about the rsync
mailing list