Incremental file-list recursion has landed in CVS; Re: RSYNC + iNotify

Matt McCutchen hashproduct+rsync at gmail.com
Mon Jul 30 18:27:14 GMT 2007


On 7/29/07, Buck Huppmann <buckh at pobox.com> wrote:
> i may be reading the code incorrectly, but it seems that, if the
> --files-from option processing can be altered (or perhaps yet another
> option could be created [shudder]) to opt out of the de-duplicate pass
> and somebody hooked inotifywait
>
>         http://inotify-tools.sourceforge.net/#info
>
> to the standard input of
>
>         rsync -r --incremental-dir --files-from=- ...
>
> (and inotifywait can be convinced to fflush() after printing each event
> and somebody also took appropriate precautions to de-duplicate entries
> within a reasonable time frame etc. etc.) then you'd have [a way to
> continuously replicate changes to one directory in another directory.]

True, the addition of incremental recursion brings rsync closer to
being suitable for continuous replication, but I think much bigger
changes would be needed to make it work than you realize.

The current design of incremental recursion is that all of the source
arguments or --files-from entries are loaded into the file list at the
beginning but traversal of directories happens little by little.  The
key point is that the sender traverses the space of file-list paths
once in ascending order.  When the sender visits a path provided by
multiple source arguments, it chooses one; "unduplication" refers to
this.  Making the sender capable of visiting individual paths on
demand (in the order given and possibly multiple times each) would
require a lot more than just removing this "unduplication".  Wayne
probably knows more about what regions of the code currently count on
paths being visited once in ascending order and how they would have to
be changed.

Currently, I think the best option is to do a separate rsync run for
each notification (or group of notifications arriving close in time).
Write a little script that reads the inotifywait output and, whenever
there is a lull, invokes rsync with the changed paths it has
accumulated using --files-from.  If you don't want the overhead of
establishing a separate network connection for each run, you can open
a single connection in advance and do some trickery with --rsh or
--rsync-path so that it can be reused by multiple rsync instances, or
you can use an ssh port forward (which really comes to the same
thing).

Matt


More information about the rsync mailing list