[PATCH] Batch-mode rewrite

Chris Shoemaker c.shoemaker at cox.net
Mon Jul 12 18:47:30 GMT 2004


On Mon, Jul 12, 2004 at 12:34:38PM -0700, Wayne Davison wrote:
> First, a summary of my thoughts:
> 
> This looks to be a much simpler way to integrate batch support into
> rsync than what we currently have.  I'm quite interested to see this
> refined further.  Nice work!
> 
	Thanks for your thorough review and quick feedback!


> Some other comments:
> 
> On Sun, Jul 11, 2004 at 06:08:04PM -0400, Chris Shoemaker wrote:
> > 1) I suspect one area in client_run() is non-portable.
> 
> I assume you mean the use of /dev/null.  That idiom is used elsewhere in
> rsync, so it should be safe to use it in one more place.

Yes, that's what I meant.  That's good to know.  I didn't think to grep
for other usage.  I'll remove the comment.

> 
> > 	If you are open to some protocol changes with the motivation of
> > unifying the client/server protocol with the sender/receiver protocol,
> > then I can write something up.
> 
> I think it might be best to weigh the benefits of what is gained by the
> unification compared with the added complexity of maintaining both the
> old and new protocol methods in the code for years to come.  My initial
> reaction is to leave it alone, but feel free to argue for changes that
> you believe in.

I agree about weighing cost and benefit.  I'll continue to consider
this.

> 
> > ! /* start_inband_exchange() contains an unfortunate write_batch
> > !  * hack/workaround.  The issue here is that the protocol for version
> > !  * exchange differs when an rsyncd server is involved.  However, the
> > !  * batch file written must be the same whether a server is involved or
> > !  * not.  If only version exchange always used the same protocol...
> > !  */ 
> 
> One thought here:  would it make things simpler to separate the option-
> parsing variables (read_batch & write_batch) from a set of variables
> that would indicate that the mode is currently active (for instance,
> "read_batch_enabled" and "write_batch_enabled")?  This might allow you
> to remove the hack from start_inband_exchange() IFF the recording of the
> protocol can be turned on at a single common point (perhaps with an
> exception to write out the version number).  I didn't look to see how
> much starting and stopped is needed, though, so this idea may not be of
> any use.

You're probably right, although I think one "batch_enabled" would
suffice.  I'll look for the common point.

> Seems to me this would be simpler to disable write_batch prior to this
> block of code, and then add a single "if" after this block that always
> calls "write_int(batch_fd, checksum_seed)" if we're writing a batch:
> 
> 	int write_batch_save = write_batch;
> 	write_batch = 0;
>   	if (am_server) {
>   		if (!checksum_seed)
>   		        checksum_seed = time(NULL);
>   		write_int(f_out, checksum_seed);
>   	} else {
>   		checksum_seed = read_int(f_in);
>   	}
> 	if (write_batch_save) {
> 		write_int(f_out, checksum_seed);
> 		write_batch = 1;
> 	}

Yes, you're right.

> 
> Of course, if my delayed-start idea works out, you might be able to
> leave that code completely unchanged and just do the write_int() call
> later on in the code when batch writing actually gets turned on for the
> first time.

Hmm, good idea, I'll see.

> 
> In readfd():
> > + 	if (write_batch && !am_sender) {
> 
> In writefd():
> > + 	if (write_batch && am_sender) {
> 
> I'm thinking these checks might be safer if some init code did this:
> 
> int write_batch_monitor_in = -1;
> int write_batch_monitor_out = -1;
> 
> 	if (write_batch) {
> 		if (am_sender)
> 			write_batch_monitor_out = f_out;
> 		else
> 			write_batch_monitor_in = f_in;
> 	}
> 
> Then the code in readfd() could be "if (fd == write_batch_monitor_in)"
> and the code in writefd() could be "if (fd == write_batch_monitor_out)".
> Thus, the code could never record any I/O to the wrong fd.  For
> instance, a diff in the patches dir makes the receiver read data on
> a pipe from the generator, and we wouldn't want that going into the
> batch file (when the receiver was the one writing the batch file).
> 

Ah, yes.  That is a safer way.

> > + 		if (read_batch) exit_cleanup(0);  /* no reason to continue */
> 
> I'm curious how easy it would be to avoid forking in the first place
> when reading a batch, but I didn't look to see how much more intrusive
> that change would be in the code.

I was also curious, but not enough to actually check.  :)  Do you suppose a
fork is expensive enough to want to avoid?

> 
> The code needs to disallow a remote destination when reading a batch:
> 
>     rsync -av --read-batch=foo localhost:/path/bar

Hmm, that case wasn't in my testing.  Did I introduce this limitation or
has it always been so?

> 
> One other thing I noticed was, if I bumped up the protocol version in
> the saved batch file to 32, your enhanced rsync would still read the
> file without complaint.  This is because rsync believes that the sender
> is going to see its maximum protocol number of 28 and limit itself
> (which is not going to happen when the data is coming from a batch).
> So, it would be good to have rsync complain and die if the batch file
> is beyond what the reader supports.

Oh, good catch.  I'll fix that.

> 
> Another thing I noticed was that a local --write-batch copy behaved as
> if --whole-file had been specified.  The easiest fix is to move the
> "write_batch = 0;" line in local_child() out of the pid == 0 path so
> that the receiver is the one that is writing the batch file, and thus
> the whole-file option will default correctly in the generator.

Hmm, I forgot about that.  Q: Shouldn't this be set in parse_arguments?
A:  Oh yeah, that's right, we don't know that we're local until we parse
the src and dest args.  :-(  I guess it DOES matter which patch the
write-batch=0 is on.

> 
> I assume you used -b on your diff because you thought that it reduced
> the changes for human reading, but I'd prefer to see the diff without -b
> in the future (so that it is easy to apply) and using -u (so that it is
> easy to read).

Ok.   "diff -cu" it is.  I used -b because the auto-tab feature
in emacs sometimes causes noisy whitespace changes in the diff.

I'll incorporate your comments and rediff.

-chris

> 
> ..wayne..
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


More information about the rsync mailing list