Incremental file-list recursion has landed in CVS; Re: RSYNC + iNotify

Buck Huppmann buckh at pobox.com
Sun Jul 29 17:47:54 GMT 2007


Way back,
On Fri, Jan 12, 2007 at 08:12:30AM +0000, Wayne Davison wrote:
> On Thu, Jan 11, 2007 at 05:22:55PM -0500, Matt McCutchen wrote:
> > Specifically, I'm curious about what areas under the source
> > argument(s) are scanned at what time.
> 
> All the args that the user supplies are scanned at once, allowing them
> to be unduplicated as they would be in a normal transfer.  The only
> difference is that no recursing happens during the initial sending of
> the first file-list.  Then, one directory at a time is scanned and sent
> over until we have a decent number of upcoming files in the pipeline for
> the generator.  The result is that typical transfers with a small number
> of source args can start transferring files almost immediately, and the
> depth-first scan of directories continues intermixed in with the file
> transfers (or at least intermixed with the generator's scanning for
> changed files when no transfers are needed).
> 
> If the --relative option is used, implied directories are treated as
> "args" and sent in the first wave.  The --files-from option has always
> treated the items read from the file as args, so a transfer with a huge
> number of files-from items and no real recursion doesn't get any benefit
> from the incremental recursion.
> 
> > Also, does the incremental scan rule out "file has vanished" warnings?
> 
> It lessens their chance of occurring because the time that elapses
> between a directory scan and the time the generator starts to work on
> those files is much shorter than waiting for the full scan to complete.
> However, there can still be vanished files as there is still some
> reading ahead of directories (rsync tries to keep a good amount of work
> in the pipeline for the generator to blaze through).  I haven't decided
> exactly what I want the read-ahead limit to be, but the current code
> wants 1000 files to be available beyond the currently-active directory.

i may be reading the code incorrectly, but it seems that, if the
--files-from option processing can be altered (or perhaps yet another
option could be created [shudder]) to opt out of the de-duplicate pass
and somebody hooked inotifywait

	http://inotify-tools.sourceforge.net/#info

to the standard input of

	rsync -r --incremental-dir --files-from=- ...

(and inotifywait can be convinced to fflush() after printing each event
and somebody also took appropriate precautions to de-duplicate entries
within a reasonable time frame etc. etc.) then you'd have sorta like
what, even further back,

On Wed, Feb  8, 2006 at 06:12:57PM +0000, Dag Wieers wrote:

> On Tue, 31 Jan 2006, Ryan Kather wrote:
> 
> > I'm looking for a way to continually monitor at least one but possibly 
> > multiple directories (and/or individual files).  I would like RSYNC to 
> > immediately synchronize the changes to said directory(ies) after they 
> > occur.  I believe the best approach for this would be to utilize
> > iNotify 
> > enabled kernels and create a plugin for the RSYNC daemon.
> 
> > However, before I begin the task of actually writing some code (with
> > my 
> > poor abilities), I thought I would inquire if anyone else has already 
> > created this or something similar?  Am I over thinking this, or is
> > there 
> > a better approach?  Is there a reason not to do this?  
> 
> I'm very interested in functionality like this. I remember it being 
> brought up on this list before so I would look for similar mails in the 
> archive for clues.
> 
> How to do it efficiently (eg. for files in transit/still open), I don't 
> know. Also it seems to me that you may want a seperate daemon that 
> implements the rsync protocol itself (instead of relaying on an external 
> tool) as that allows you to optimize certain things and have less 
> overhead.
> 
> I'm most interested in writing this in python, using a python-rsync 
> implementation and python-inotify.
> 
> Kind regards,
> --   dag wieers,

sorry if this is wacko or retraces ground already covered on the list
(haven't been paying attention for a while, since rsync does everything
i could possibly need--except, of course, just this 1 more thing, which
i disclaim any liability for proposing, if the camel's back should break,
security- or otherwise) or otherwise amounts to a waste of time, but the
possibility of continuous rsync-ing, without you having to make but a few
concessions in your code, seems like it might be worth making an idiot out
of myself (a NOP) to suggest. (on Linux, i mean, though BSD must have
similar kqueue/kevent tools available, i'd suppose)

but if i just missed the news that somebody's found a good way to do it
in the mean time, i'd be happy to hear about it. (google is keeping mum,
if so)

thanks again for all your work on this, and a last question: does any
command have as many command-line options as rsync? are there as many
atoms in the universe as combinations of rsync options? would adding
a --sequential-files-from get you there?


More information about the rsync mailing list