[Bug 3099] Please parallelize filesystem scan

Chris Shoemaker c.shoemaker at cox.net
Fri Sep 16 01:32:44 GMT 2005


On Thu, Sep 15, 2005 at 04:23:24PM -0700, samba-bugs at samba.org wrote:
> https://bugzilla.samba.org/show_bug.cgi?id=3099
> 
> 
> 
> 
> 
> ------- Additional Comments From wayned at samba.org  2005-09-15 16:23 -------
> Created an attachment (id=1448)
>  --> (https://bugzilla.samba.org/attachment.cgi?id=1448&action=view)
> One possible way to reorder the checksum computation.
> 
> > how could it possibly require a change to the rsync protocol for the
> > second host in the sequence to pre-scan its filesystem, so that that
> > data is available when needed?
> 
> The only way to know what to scan is to look at the file list from the sender
> (since the receiver usually doesn't know anything other than the destination
> directory, and options such as -R, --exclude, and --files-from can radically
> limit what files need to be scanned).
> 
> I suppose it would be possible for the receiver to compute the full-file
> checksums as the file list is arriving from the sender (yes, the sender sends
> the list incrementally as it is created), but the code currently doesn't know
> if the destination spec is a file or a directory until after it receives the
> file list, so the code would need to be made to attempt a chdir to the
> destination arg and to skip the pre-caching if that doesn't work.
> 
> One bad thing about this solution is that we really should be making the
> sending side not pre-compute the checksums before the start of the transfer
> phase (to be like the generator, which computes the checksums while looking for
> files to transfer). Computing them during the transfer makes it more likley
> that the file's data in the disk cache will be able to be re-used when a file
> needs to be updated. Thus, changing the receiving side to pre-compute the
> checksums before starting the transfer seems to be going in the wrong direction
> (though it might speed up a large transfer where few files were different, it
> might also slow down a large transfer where many files were changed).

IMHO, in general, optimizing for the "few-changes" (small delta) case
is the right thing to do.  Rsync's utility diminishes anyway as delta
increases, so there's no reason not to make efficiency increase with
increasing delta.

-chris

> 
> The attached patch implements a simple pre-scan that works with basic options.
> It could be improved to handle things like --compare-dest better, but I think
> it basically works.  If you'd care to run some speed tests, maybe you could
> persuade me that this kluge would be worth looking at further (I'm not
> considering it at the moment).
> 
> -- 
> Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the QA contact for the bug, or are watching the QA contact.
> -- 
> To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


More information about the rsync mailing list