[PATCH] Batch-mode rewrite

Wayne Davison wayned at samba.org
Mon Jul 12 19:34:38 GMT 2004


First, a summary of my thoughts:

This looks to be a much simpler way to integrate batch support into
rsync than what we currently have.  I'm quite interested to see this
refined further.  Nice work!

Some other comments:

On Sun, Jul 11, 2004 at 06:08:04PM -0400, Chris Shoemaker wrote:
> 1) I suspect one area in client_run() is non-portable.

I assume you mean the use of /dev/null.  That idiom is used elsewhere in
rsync, so it should be safe to use it in one more place.

> 	If you are open to some protocol changes with the motivation of
> unifying the client/server protocol with the sender/receiver protocol,
> then I can write something up.

I think it might be best to weigh the benefits of what is gained by the
unification compared with the added complexity of maintaining both the
old and new protocol methods in the code for years to come.  My initial
reaction is to leave it alone, but feel free to argue for changes that
you believe in.

> ! /* start_inband_exchange() contains an unfortunate write_batch
> !  * hack/workaround.  The issue here is that the protocol for version
> !  * exchange differs when an rsyncd server is involved.  However, the
> !  * batch file written must be the same whether a server is involved or
> !  * not.  If only version exchange always used the same protocol...
> !  */ 

One thought here:  would it make things simpler to separate the option-
parsing variables (read_batch & write_batch) from a set of variables
that would indicate that the mode is currently active (for instance,
"read_batch_enabled" and "write_batch_enabled")?  This might allow you
to remove the hack from start_inband_exchange() IFF the recording of the
protocol can be turned on at a single common point (perhaps with an
exception to write out the version number).  I didn't look to see how
much starting and stopped is needed, though, so this idea may not be of
any use.

>   	if (am_server) {
>   		if (!checksum_seed)
>   		        checksum_seed = time(NULL);
>   		write_int(f_out, checksum_seed);
> + 		if (!am_sender && write_batch) {
> + 		        write_batch = 0;
> + 			write_int(batch_fd, checksum_seed);
> + 			write_batch = 1;
> + 		}
>   	} else {
>   		checksum_seed = read_int(f_in);
> + 		if (am_sender && write_batch) {
> + 		        write_batch = 0;
> + 		        write_int(batch_fd, checksum_seed);
> + 			write_batch = 1;
> + 		}
>   	}

Seems to me this would be simpler to disable write_batch prior to this
block of code, and then add a single "if" after this block that always
calls "write_int(batch_fd, checksum_seed)" if we're writing a batch:

	int write_batch_save = write_batch;
	write_batch = 0;
  	if (am_server) {
  		if (!checksum_seed)
  		        checksum_seed = time(NULL);
  		write_int(f_out, checksum_seed);
  	} else {
  		checksum_seed = read_int(f_in);
  	}
	if (write_batch_save) {
		write_int(f_out, checksum_seed);
		write_batch = 1;
	}

Of course, if my delayed-start idea works out, you might be able to
leave that code completely unchanged and just do the write_int() call
later on in the code when batch writing actually gets turned on for the
first time.

In readfd():
> + 	if (write_batch && !am_sender) {

In writefd():
> + 	if (write_batch && am_sender) {

I'm thinking these checks might be safer if some init code did this:

int write_batch_monitor_in = -1;
int write_batch_monitor_out = -1;

	if (write_batch) {
		if (am_sender)
			write_batch_monitor_out = f_out;
		else
			write_batch_monitor_in = f_in;
	}

Then the code in readfd() could be "if (fd == write_batch_monitor_in)"
and the code in writefd() could be "if (fd == write_batch_monitor_out)".
Thus, the code could never record any I/O to the wrong fd.  For
instance, a diff in the patches dir makes the receiver read data on
a pipe from the generator, and we wouldn't want that going into the
batch file (when the receiver was the one writing the batch file).

> + 		if (read_batch) exit_cleanup(0);  /* no reason to continue */

I'm curious how easy it would be to avoid forking in the first place
when reading a batch, but I didn't look to see how much more intrusive
that change would be in the code.

The code needs to disallow a remote destination when reading a batch:

    rsync -av --read-batch=foo localhost:/path/bar

One other thing I noticed was, if I bumped up the protocol version in
the saved batch file to 32, your enhanced rsync would still read the
file without complaint.  This is because rsync believes that the sender
is going to see its maximum protocol number of 28 and limit itself
(which is not going to happen when the data is coming from a batch).
So, it would be good to have rsync complain and die if the batch file
is beyond what the reader supports.

Another thing I noticed was that a local --write-batch copy behaved as
if --whole-file had been specified.  The easiest fix is to move the
"write_batch = 0;" line in local_child() out of the pid == 0 path so
that the receiver is the one that is writing the batch file, and thus
the whole-file option will default correctly in the generator.

I assume you used -b on your diff because you thought that it reduced
the changes for human reading, but I'd prefer to see the diff without -b
in the future (so that it is easy to apply) and using -u (so that it is
easy to read).

..wayne..


More information about the rsync mailing list