[Bug 5124] Parallelize the rsync run using multiple threads and/or connections

Marc Roos M.Roos at f1-outsourcing.eu
Thu Feb 7 10:47:25 UTC 2019


I made a bash script doing this in parallel, checks how many rsyncs are 
running and then starts another 'concurrent one'. My parallel sessions 
are against different servers. I doubt if it would make any sense doing 
multiple sessions between the same two hosts. My single rsync sessions 
was already limited by the hosts iops. So two threads would run at half 
speed.

IMO rsync does what it needs to do, if you want it to run in parallel 
execute it in parallel.


 >
 >--- Comment #8 from Michael <michael.williams at infatech.co.nz> ---
 >+1 from me on this.
 >
 >We have several situations where we need to copy a large number of 
very small
 >files, and I expect that having multiple file transfer threads, 
allowing say ~5
 >transfers concurrently, would speed up the process considerably. I 
expect that
 >this would also make better use of the available network bandwidth as 
each
 >transfer appears to have an overhead for starting and completing the 
transfer

So test it with two or 3 concurrent sessions.

 >which makes the effective transfer rate far less than the available 
network
 >bandwidth. This is the method one of our pieces of backup software 
uses to
 >speed up backups and is also implemented in FileZilla for file 
transfers.
 >Consider a very large file that needs to be transferred, along with a 
number of
 >small files. In a single transfer mode, all other files would need to 
wait
 >while the large file is transferred. If there are multiple transfers 
happening
 >concurrently, the smaller files will continue transferring while the 
large file
 >transfers. I have seen the benefits of this sort of implementation in 
other
 >software.
 >
 >I can also see benefits in having file transfers begin whilst rsync is
 >comparing files. This could logically work if you consider rsync makes 
a 'list'
 >of files to be transferred and that it begins transferring files as 
soon as
 >this list begins to be populated. In situations where there are a 
large number
 >of files and few of these files changed, the sync could effectively be
 >completed by the time rsync is finished comparing files (given the few 
changed
 >files may have already been transferred during the file comparison). 
This also
 >is effectively implemented in FileZilla (consider copying a directory 
in which
 >FileZilla has to recurse into each directory and add each file to copy 
into the
 >queue).
 >
 >Interestingly, I assumed this was already an option for rsync, so I 
went
 >looking to find the necessary option. However, all I found were the 
previously
 >mentioned hacks, which weren't what I was going for.
 >
 >
 >



More information about the rsync mailing list