"intelligent" rsync scripts?

Chris Shoemaker c.shoemaker at cox.net
Thu Nov 10 15:32:50 GMT 2005


On Wed, Nov 09, 2005 at 11:52:40PM -0800, Wayne Davison wrote:
> > Are you saying only unchanged files are available as alternate basis
> > files?  If we can, I think it's worth avoiding this restriction.
> 
> If we were to use the files directly, then it would be complicated to
> try to order the updates to avoid changing a file before another file
> could use it as a basis file.  However, I've come up with an algorithm
> I like better that avoids this restriction completely:
> 
> Rsync already supports the idea of a "partial dir" that can be scanned
> for partially-transferred files and delayed updates.  I'm thinking that
> hard-linking files into this directory makes this new feature much
> easier and more memory efficient (the dir is named ".~tmp~" by default,
> relative to the containing directory of the to-be-updated files).

Hmm. I see the complexity of using a potentially changing file as an
alternate basis, but I don't see how hardlinking makes this simpler.
If the original file changes, then so will the hard link.  What am I
missing?

> I also thought through where I'd like the rename scan to go.  I finally
> decided that I liked the idea of piggy-backing the scan on the existing
> delete-before or delete-during scans that already occur, since this
> makes the logic much simpler (the code already exists to handle all the
> proper include/exclude logic, including local .cvsignore/.rsync-filter
> files) and it should also make the scan quick because it will take
> advantage of disk I/O that is either already occurring, or is at least
> in close proximity to identical stat() calls that the generator's update
> code is going to make.  (If either --delete-after was selected or no
> deletions are occurring, rsync does the rename scan during the transfer
> using a non-deleting version of the delete-during code).  The only
> potential problem with this scan position is that the receiving side may
> not have fully finished its scan when we encounter a missing file that
> doesn't have a size+mtime match yet, so I allow missing files to be
> delayed until the receiving-side scan is complete (at which point we
> check to see if a match has shown up yet or not).

Reusing the delete-scanning sounds good, but I don't think you have to
use both the --delete-before scan and the --delete-during scan.  I
think the don't-really-delete mode for delete-during is sufficient.  I
really think --detect-renames is incompatible with --delete-before,
even though you can make it look like they're not.  The problem is
that I think one main use of --delete-before is to avoid running out
of hard drive space.  If --detect-renames hardlinks the deleted files
it doesn't matter that the orginals are deleted before transfer; hard
drive space is not reduced.  Thus, I think you can avoid the
--delete-before scan.

> My code also attempts to match up files even when they're not missing.

Nice!  Very handy!

> This works to the fullest extent when a delete-before scan is in effect,

Oh, because the match-search for non-missing files is not delayed in
the --delete-during scan, right?  Even so, that gives (part of) a
significant benefit that I didn't expect, so it's a good thing.  I'll
have to think more about the full rename-and-replace problem across
multiple directories.

> but it still handles the case of the rotating log files quite nicely
> (associating all the moved files together as you would expect).
> 
> A patch for the CVS version is here:
> 
>     http://opencoder.net/detect-renames.diff

Not a big diff considering the impact on functionality.  You make it
look easy! I like it. :)

> The code is still a little ugly, but it does appear to work well in my
> limited testing.  If I like the idea, I'll look into how to share the
> code for the delete scan in a way that is not as ugly as the current
> logic.
> 
> > $ cp foo foo.orig; edit foo
> > 
> > Not using the old foo as the basis for foo.orig just because foo
> > changed really hurts.
> 
> If the user uses "cp -p foo foo.orig" we will find it.  The patch could
> be extended to switch from size+mtime to use size+checksum, but I
> haven't done that yet (and checksumming is so slow that most folks tend
> to avoid it).

At least it catches move-and-replace.  That's a real bonus.  So, will
this be in 2.6.7?

-chris


More information about the rsync mailing list