[Bug 3099] Please parallelize filesystem scan

Chris Shoemaker c.shoemaker at cox.net
Fri Sep 16 13:19:59 GMT 2005


On Thu, Sep 15, 2005 at 09:32:44PM -0400, Chris Shoemaker wrote:
> On Thu, Sep 15, 2005 at 04:23:24PM -0700, samba-bugs at samba.org wrote:
> > https://bugzilla.samba.org/show_bug.cgi?id=3099
> > 
> > 
> > 
> > 
> > 
> > ------- Additional Comments From wayned at samba.org  2005-09-15 16:23 -------
> > Created an attachment (id=1448)
> >  --> (https://bugzilla.samba.org/attachment.cgi?id=1448&action=view)
> > One possible way to reorder the checksum computation.
> > 
> > > how could it possibly require a change to the rsync protocol for the
> > > second host in the sequence to pre-scan its filesystem, so that that
> > > data is available when needed?
> > 
> > The only way to know what to scan is to look at the file list from the sender
> > (since the receiver usually doesn't know anything other than the destination
> > directory, and options such as -R, --exclude, and --files-from can radically
> > limit what files need to be scanned).
> > 
> > I suppose it would be possible for the receiver to compute the full-file
> > checksums as the file list is arriving from the sender (yes, the sender sends
> > the list incrementally as it is created), but the code currently doesn't know
> > if the destination spec is a file or a directory until after it receives the
> > file list, so the code would need to be made to attempt a chdir to the
> > destination arg and to skip the pre-caching if that doesn't work.
> > 
> > One bad thing about this solution is that we really should be making the
> > sending side not pre-compute the checksums before the start of the transfer
> > phase (to be like the generator, which computes the checksums while looking for
> > files to transfer). Computing them during the transfer makes it more likley
> > that the file's data in the disk cache will be able to be re-used when a file
> > needs to be updated. Thus, changing the receiving side to pre-compute the
> > checksums before starting the transfer seems to be going in the wrong direction
> > (though it might speed up a large transfer where few files were different, it
> > might also slow down a large transfer where many files were changed).
> 
> IMHO, in general, optimizing for the "few-changes" (small delta) case
> is the right thing to do.  Rsync's utility diminishes anyway as delta
> increases, so there's no reason not to make efficiency increase with
> increasing delta.

err... I meant: make efficiency increase as delta *decreases*.
i.e. optimize for small-changes case.

> 
> -chris
> 
> > 
> > The attached patch implements a simple pre-scan that works with basic options.
> > It could be improved to handle things like --compare-dest better, but I think
> > it basically works.  If you'd care to run some speed tests, maybe you could
> > persuade me that this kluge would be worth looking at further (I'm not
> > considering it at the moment).
> > 
> > -- 
> > Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
> > ------- You are receiving this mail because: -------
> > You are the QA contact for the bug, or are watching the QA contact.
> > -- 
> > To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
> > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
> -- 
> To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


More information about the rsync mailing list