Native Parallelization in rsync

Stier, Matthew Matthew.Stier at us.fujitsu.com
Thu Sep 5 19:54:08 CEST 2013


There’s more to the issue, than simply splitting the files among N workers.

All the directories would need to transferred first. You wouldn’t want worker “D” trying to copy a file, before worker “C” was able to create the directory to hold it.

There is also the issue of hard links. Where would the inode information be kept, so that hard links can be detected, transferred; rather than replicating files needlessly at the destination.

I’m sure that there has been many threads covering these issues  in the past.  It all comes down the development and testing costs, far outweigh the potential gains.

If there really any area that could regularly gain from multithreading, it would be on the sender side with checksums and compression.

The only time I’ve seen where some kind of threading would have help, is I was trying to transfer hundreds of gigabytes of data across a gigabyte link.  My SPARC servers couldn’t keep the pipe filled, until I stepped up a directory level and copied each of those directories as a separate rsync session.


From: rsync-bounces at lists.samba.org [mailto:rsync-bounces at lists.samba.org] On Behalf Of ameirh at gmail.com
Sent: Wednesday, September 04, 2013 6:29 PM
To: rsync at lists.samba.org
Subject: Native Parallelization in rsync

I'm sure this is a topic that's come up plenty of times before, but is it possible to implement native parallelization in rsync?  There are several blog posts listing workarounds, but they are largely unideal.

My initial thought would be that rsync could split up the file list across N workers; the incremental file list would still be generated as it does today, but can then be divided evenly across a user-specified number of threads/processes that do the actual transfer of the files.  I can think of several tweaks to this process, but I think this is a good starting point.

I'm sure there are challenges with this approach that I'm not aware of, but it would be a great way to take advantage of free system resources and bandwidth, especially when transferring thousands of files.

Is this something that's in the works or that can be in the works?

Thanks for the awesome project!

-Ameir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20130905/9d51f4f2/attachment.html>


More information about the rsync mailing list