[Bug 3099] Please parallelize filesystem scan

samba-bugs at samba.org samba-bugs at samba.org
Thu Sep 15 23:23:24 GMT 2005


------- Additional Comments From wayned at samba.org  2005-09-15 16:23 -------
Created an attachment (id=1448)
 --> (https://bugzilla.samba.org/attachment.cgi?id=1448&action=view)
One possible way to reorder the checksum computation.

> how could it possibly require a change to the rsync protocol for the
> second host in the sequence to pre-scan its filesystem, so that that
> data is available when needed?

The only way to know what to scan is to look at the file list from the sender
(since the receiver usually doesn't know anything other than the destination
directory, and options such as -R, --exclude, and --files-from can radically
limit what files need to be scanned).

I suppose it would be possible for the receiver to compute the full-file
checksums as the file list is arriving from the sender (yes, the sender sends
the list incrementally as it is created), but the code currently doesn't know
if the destination spec is a file or a directory until after it receives the
file list, so the code would need to be made to attempt a chdir to the
destination arg and to skip the pre-caching if that doesn't work.

One bad thing about this solution is that we really should be making the
sending side not pre-compute the checksums before the start of the transfer
phase (to be like the generator, which computes the checksums while looking for
files to transfer). Computing them during the transfer makes it more likley
that the file's data in the disk cache will be able to be re-used when a file
needs to be updated. Thus, changing the receiving side to pre-compute the
checksums before starting the transfer seems to be going in the wrong direction
(though it might speed up a large transfer where few files were different, it
might also slow down a large transfer where many files were changed).

The attached patch implements a simple pre-scan that works with basic options.
It could be improved to handle things like --compare-dest better, but I think
it basically works.  If you'd care to run some speed tests, maybe you could
persuade me that this kluge would be worth looking at further (I'm not
considering it at the moment).

Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.

More information about the rsync mailing list