[Bug 9812] New: Lookahead file-list loading and comparison

samba-bugs at samba.org samba-bugs at samba.org
Thu Apr 18 11:25:37 MDT 2013


           Summary: Lookahead file-list loading and comparison
           Product: rsync
           Version: 3.1.0
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: core
        AssignedTo: wayned at samba.org
        ReportedBy: me at haravikk.com
         QAContact: rsync-qa at samba.org

I've been using rsync for various things for some time now, but only recently
have I properly begun using it with a remote server, in my particular case to
create redundant copies of very large backup structures (almost a million
files, ~3tb in total) which of course is trying for most software to manage.

However, the main problem that I've noticed with rsync is that it takes a
*very* long time to detect changes that can start being synced to the server,
even with incremental file lists, presumably a result of having to build a list
of current X current files, send to the other server and then await a response.

I think the best way to resolve this is to provide more look-ahead on the file
list exchanges. Basically what would happen is that once the client has sent
the parameters to the receiver, both will start loading all matching files in
order to get timestamps/checksums ready for comparison. As soon as the first
file-list segment is ready the client will send it. Hopefully by the time it
does the server already has a full set of file-data in the same basic order to
compare against, allowing it to rapidly detect changed, deleted or new files.

This process can also be optimised, such that if the file data for an entire
directory is loaded before the next segment/comparison is required, then it
will be condensed into a timestamp/checksum for the directory only. In this way
the client can sent any available directory times/checksums for the receiver
for rapid comparison; if the receiver's directory isn't matched then it will
request the file-data from the client, which should still have it cached.

The whole mechanism would operate within a reasonable buffer, to conserve
memory but while holding onto enough file-data at each end for quick
sending/comparison as required.

Basically the idea is to get each end of the connection doing as much work as
it can without actually having to communicate with each other, so that when
communication does occur it is as optimised as possible.

Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

More information about the rsync mailing list