Parallel rsync's for better Performance.
ms at citd.de
Wed Oct 28 16:46:09 MDT 2009
On 28.10.2009 18:27, Matt McCutchen wrote:
> On Wed, 2009-10-28 at 17:24 +0100, Matthias Schniedermeyer wrote:
> > On 28.10.2009 10:35, Matt McCutchen wrote:
> > > On Wed, 2009-10-28 at 10:01 +0100, Matthias Schniedermeyer wrote:
> > > > Otherwise parallel rsyncs completly kill any performance you had because
> > > > normal HDDs will fall into a seek-storm, when more than 1 rsync works on
> > > > them.
> > >
> > > Asynchronous I/O may solve that, on OSes that support it.
> > No. That's a fundamental problem with ANY rotating media device.
> "Solve" may be an overstatement, but asynchronous I/O would at least
> help significantly because one process could issue many I/O requests to
> the same area of disk at once, and the disk scheduler could fulfill all
> of those requests before seeking elsewhere. Without asynchronous I/O,
> after the scheduler fulfills one request, it is left to either seek or
> wait for the process to issue another request.
And "same disc region" is kind of a problem. In most modern filesystems
inodes can be pretty random so you can't for sure sort the files by
inode, or something like that.
But the bigger problem may be the "99%" unchanged but millions of files
case': Where on the platter is the metadata and how could you optimise
disc access for that.
The only thing that comes to my mind is something for when you
repeatetly rsync something.
You could store the access-pattern and the timing, do that several times
with randomization and use a genetic algorithm that determines the
best(tm) access strategy. After a few generations you should be at least
better than before. :-)
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.
More information about the rsync