Parallel rsync's for better Performance.

Matthias Schniedermeyer ms at citd.de
Wed Oct 28 16:46:09 MDT 2009


On 28.10.2009 18:27, Matt McCutchen wrote:
> On Wed, 2009-10-28 at 17:24 +0100, Matthias Schniedermeyer wrote:
> > On 28.10.2009 10:35, Matt McCutchen wrote:
> > > On Wed, 2009-10-28 at 10:01 +0100, Matthias Schniedermeyer wrote:
> > > > Otherwise parallel rsyncs completly kill any performance you had because 
> > > > normal HDDs will fall into a seek-storm, when more than 1 rsync works on 
> > > > them.
> > > 
> > > Asynchronous I/O may solve that, on OSes that support it.
> > 
> > No. That's a fundamental problem with ANY rotating media device.
> 
> "Solve" may be an overstatement, but asynchronous I/O would at least
> help significantly because one process could issue many I/O requests to
> the same area of disk at once, and the disk scheduler could fulfill all
> of those requests before seeking elsewhere.  Without asynchronous I/O,
> after the scheduler fulfills one request, it is left to either seek or
> wait for the process to issue another request.

And "same disc region" is kind of a problem. In most modern filesystems 
inodes can be pretty random so you can't for sure sort the files by 
inode, or something like that.

But the bigger problem may be the "99%" unchanged but millions of files 
case': Where on the platter is the metadata and how could you optimise 
disc access for that.


The only thing that comes to my mind is something for when you
repeatetly rsync something.

You could store the access-pattern and the timing, do that several times 
with randomization and use a genetic algorithm that determines the 
best(tm) access strategy. After a few generations you should be at least 
better than before. :-)




Bis denn

-- 
Real Programmers consider "what you see is what you get" to be just as 
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated, 
cryptic, powerful, unforgiving, dangerous.



More information about the rsync mailing list