[PATCH] Batch-mode rewrite

Wayne Davison wayned at samba.org
Mon Jul 19 00:25:18 GMT 2004


On Fri, Jul 16, 2004 at 04:40:41PM -0400, Chris Shoemaker wrote:
> If I understand your changes, the files-from stuff you're skipping is
> only the flagging of the fd, no actual communication.

Yes, you're right there -- my comment wasn't accurate.  I was worried
about the files-from data getting in the batch file, but since it only
ever travels from the receiver to the sender (when the client isn't the
sender), then it doesn't really seem possible for it to get into the
stream that is being recorded (which is coming from the sender).

> if the user uses --exclude and --delete and then applies the batch
> with the --delete but not the exclude, what should happen?

Seems like the two choices we have are:

(1)  Force the excludes into the batch file and read them in the local-
to-local batch-reading transfer.

(2)  Require the user to re-specify the excludes if they want the same
update (allowing them to skip them as they see fit).  This route would
cause the deletes to remove more files than the original transfer if the
user failed to re-specify the excludes and they used --delete.  Also, if
there were excluded symlinks, directories, and devices, the generator
would not know to skip them in the batch-reading run unless the same
excludes were specified.

So, perhaps we should go ahead and save off the exclude list in the
batch file and force read_batch mode to read them?  It should be as
simple as an extra call to "send_exclude_list(batch_fd);" and the
addition of a special recv_exclude_list() call for read_batch mode (and
the removal of the --include/--exclude options out of the argv file).

> However, the exclude list is another one of those protocol
> dependancies on server-ness, isn't it?

Yes.  It is also not sent if the client side is the sender and the
receiver doesn't need it (which it doesn't unless the user specified
--delete without --delete-excluded).

One other thing that I noticed is that the synchronization between the
generator and the receiver is no longer present, so a batch-reading run
can possibly do some things in the receiver too soon (for instance, if
the generator hasn't gotten around to creating the required parent dirs
for the receiver).  There are two solutions to this:

(1)  Don't /dev/null the data from the generator, but instead monitor it
and only let the receiver process a file when its number has been
requested from the generator.

(2)  Use another way to convey the same information, like the idea below.

There is a diff in the patches dir called g2r-basis-filename.diff
because it sends the name that the generator found for the basis file to
the receiver via an extra pipe that gets created before the two fork
(the idea is to avoid duplicating the same basis-file search in the
receiver and risking having it find a different file than what was used
to generate the checksums -- something that is particularly useful for
both the multiple-compare-dest diff and the fuzzy-name matching diff).
I modified this g2r patch to also convey to the receiver what file-list
index the name refers to (but only in batch mode) so that the receiver
can notice when thing aren't quite in sync (i.e. if the batch data
doesn't have exactly the same items that the generator wants to update).

One question was prompted by my work on this patch:

The sending of the extra file-list index value is only enabled in
batch-reading mode (and  indeed, the extra basis-name pipe is not
normally turned on unless an option such as --compare-dest or
--read-batch is specified).  It might be advantageous to always convey
this extra index-number information from the generator to the receiver
since it would guard against a receiver that is sending an update that
the generator didn't request, but I can't think of a reason to do this.

Any thoughts on any of this?

..wayne..


More information about the rsync mailing list