"intelligent" rsync scripts?

Mon Nov 7 22:37:48 GMT 2005

On Mon, Nov 07, 2005 at 05:03:30PM -0500, Chris Shoemaker wrote:
> Yeah, I think I'm saying just treat (1) and (2) the same way.  OTOH,
> if the behavior is optional and documented, I could definitely see
> treating (1) as an exact match.

Yes, perhaps it would be better to let the user decide how strict to be.

> But you can't do the lookups until you've received the entire
> file-list, right?

We can do the hashing of what files are present on the receiving side.
The purpose is to create a database of files that will be used later
when the generator is trying to find a match for a file that is missing
(which we will discover later during the normal generator pass).

> You mean [the dir] gets removed when it's received?  Why even add it then?

Because we're creating a list of extra directories that aren't on the
sending side and we're scanning the local directory as soon as we see
its name in the received file list, which will cause us to hash names
that may later turn out to be in the list that the sender sends to us.

> # of insertions = # of receiver files not in transfer

In my described algorithm it was "# of insertions = all files on the
receiving side" because we don't know what will be in a particular
directory until after the sender recurses clear down to the bottom of
all child directories and comes back up and sends the last filename at
that directory's level.  If we change the sender to send all the files
(including all directory names) at a single level before going down into
a subdir, we could code up the local scan to occur at the point where
either the level changes or the dir changes at the current level.  Such
a change would not be compatible with older rsync receivers, though (due
to how the current receiver expects to be able to mark nested files in
its received file-list).

Your comment does remind me that we don't want to pick an alternate
basis file that is currently in the transfer since that file may
possible be updated (which can cause problems if it happens at the wrong
time).  Thus, there would need to be a lot of hash-table deletions going
on in my imagined algorithm in the file-name hash as well as the
dir-name hash.

..wayne..