"intelligent" rsync scripts?
Chris Shoemaker
c.shoemaker at cox.net
Thu Nov 10 15:32:50 GMT 2005
On Wed, Nov 09, 2005 at 11:52:40PM -0800, Wayne Davison wrote:
> > Are you saying only unchanged files are available as alternate basis
> > files? If we can, I think it's worth avoiding this restriction.
>
> If we were to use the files directly, then it would be complicated to
> try to order the updates to avoid changing a file before another file
> could use it as a basis file. However, I've come up with an algorithm
> I like better that avoids this restriction completely:
>
> Rsync already supports the idea of a "partial dir" that can be scanned
> for partially-transferred files and delayed updates. I'm thinking that
> hard-linking files into this directory makes this new feature much
> easier and more memory efficient (the dir is named ".~tmp~" by default,
> relative to the containing directory of the to-be-updated files).
Hmm. I see the complexity of using a potentially changing file as an
alternate basis, but I don't see how hardlinking makes this simpler.
If the original file changes, then so will the hard link. What am I
missing?
> I also thought through where I'd like the rename scan to go. I finally
> decided that I liked the idea of piggy-backing the scan on the existing
> delete-before or delete-during scans that already occur, since this
> makes the logic much simpler (the code already exists to handle all the
> proper include/exclude logic, including local .cvsignore/.rsync-filter
> files) and it should also make the scan quick because it will take
> advantage of disk I/O that is either already occurring, or is at least
> in close proximity to identical stat() calls that the generator's update
> code is going to make. (If either --delete-after was selected or no
> deletions are occurring, rsync does the rename scan during the transfer
> using a non-deleting version of the delete-during code). The only
> potential problem with this scan position is that the receiving side may
> not have fully finished its scan when we encounter a missing file that
> doesn't have a size+mtime match yet, so I allow missing files to be
> delayed until the receiving-side scan is complete (at which point we
> check to see if a match has shown up yet or not).
Reusing the delete-scanning sounds good, but I don't think you have to
use both the --delete-before scan and the --delete-during scan. I
think the don't-really-delete mode for delete-during is sufficient. I
really think --detect-renames is incompatible with --delete-before,
even though you can make it look like they're not. The problem is
that I think one main use of --delete-before is to avoid running out
of hard drive space. If --detect-renames hardlinks the deleted files
it doesn't matter that the orginals are deleted before transfer; hard
drive space is not reduced. Thus, I think you can avoid the
--delete-before scan.
> My code also attempts to match up files even when they're not missing.
Nice! Very handy!
> This works to the fullest extent when a delete-before scan is in effect,
Oh, because the match-search for non-missing files is not delayed in
the --delete-during scan, right? Even so, that gives (part of) a
significant benefit that I didn't expect, so it's a good thing. I'll
have to think more about the full rename-and-replace problem across
multiple directories.
> but it still handles the case of the rotating log files quite nicely
> (associating all the moved files together as you would expect).
>
> A patch for the CVS version is here:
>
> http://opencoder.net/detect-renames.diff
Not a big diff considering the impact on functionality. You make it
look easy! I like it. :)
> The code is still a little ugly, but it does appear to work well in my
> limited testing. If I like the idea, I'll look into how to share the
> code for the delete scan in a way that is not as ugly as the current
> logic.
>
> > $ cp foo foo.orig; edit foo
> >
> > Not using the old foo as the basis for foo.orig just because foo
> > changed really hurts.
>
> If the user uses "cp -p foo foo.orig" we will find it. The patch could
> be extended to switch from size+mtime to use size+checksum, but I
> haven't done that yet (and checksumming is so slow that most folks tend
> to avoid it).
At least it catches move-and-replace. That's a real bonus. So, will
this be in 2.6.7?
-chris
More information about the rsync
mailing list