Patch: Offline transfer mode

Steve Traugott stevegt at TerraLuna.Org
Wed Mar 23 23:16:17 GMT 2005


Hi Wayne, Jos, Chris, All,

Before diving into answers to Wayne, I have to ask (out of due diligence
if nothing else):  Are we past the point of being able to change the
behavior of --write-batch itself, such that it *only* writes the
batch file, and doesn't change the destination?  That would make my
entire "offline mode" patch redundant, as well as bug 2104, and also
make for simpler write_batch code...  What are people's thoughts on
this?

If you still wanted the destination changed immediately, over the wire,
you'd have to follow the rsync --write-batch run with the normal:

    ssh remote rsync --read-batch=- ... < batch.file
    
...I'm aware this will cause disruption in existing uses, and I only ask
because batch mode is still labeled "experimental" in the most recent
stable release (2.6.3), though I do see that the label has been removed
already in CVS HEAD.

As near as I can tell, this would actually make for cleaner use of the
original intent of batch mode as well -- you could regenerate the same
batch file against the same destination repeatedly; no longer would you
need to remember which target machine you ran --write-batch against,
just apply the batch file to all of them, and so on.  For instance, if
you're using this to manage the root filesystem of production UNIX
machines, then this lets you generate the batch file at your leisure
during the day, then apply it to all target hosts after hours or during
maintenance windows.  It seems to me like this would make for much more
manageable infrastructure.

On the downside, much of the work described below would *still* need to
be done in order to change the behavior of --write-batch, even though
the code would be simpler, and we wouldn't need yet another flag.  And
altering --write-batch again would delay the "stable" release of batch
mode that much longer.  These practical considerations mean that I could
argue this either way, so I'm curious what other people think.

On Wed, Mar 23, 2005 at 09:19:16AM -0800, Wayne Davison wrote:
> On Sun, Mar 20, 2005 at 08:46:20PM -0800, Steve Traugott wrote:
> > Here's an rsync patch which adds an --offline flag, letting you
> > transfer changed blocks via removable media, while still comparing
> > checksums via the net.
> 
> I'd prefer a different option name for this.  Some folks have suggested
> combining --write-batch with --dry-run for this functionality (there's
> an enhancement request for this in bugzilla), which is a pretty decent
> choice since --dry-run doesn't currently work with --write-batch.  

Combining --write-batch with --dry-run was my first plan -- but the code
didn't work out that way.  I noticed that the assumptions --dry-run
makes are slightly different -- it cuts off too early in the
send_files() while loop, for instance.  I thought if we were to re-use
the --dry-run code for this, then --dry-run code would have to be pushed
deeper into the call tree, would get more complicated, harder to
maintain, less safe, etc.  On the other hand, I had been looking at the
rsync code for a total of around 30 minutes when I came to that
conclusion, so I reserve the right to be wrong.  ;-)

> But it might be better to keep the --dry-run idiom unchanged 

I think so.

> and use a different name for the --offline option, such as
> --batch-only.  I'd think I'd prefer the latter, since it would allow
> the current --dry-run behavior to be fixed to work with --write-batch.

If nothing else, I'd think most folks would expect that --dry-run
doesn't change disk at all, on either end.  Writing a batch file with
--dry-on turned on would violate that expectation.  Even if much of the
dry_run code is re-used, I agree that a different flag would still make
sense.  I'm beginning to see that fixing --dry-run to work with
--write-batch should be easy if this patch is done right.

I was also concerned about the same things you mention in bug 2104;
overloading --dry-run without more refactoring would mean more network
traffic and more CPU usage when using --dry-run without --write-batch.

As far as the name of the flag goes, I picked --offline because it
expresses the intent of the patch -- transferring changes offline,
keeping net traffic minimal.  It did occur to me that --offline might
conflict with some future rsync incarnation where even file lists could
be transferred offline, with *no* network connection.  At the time this
seemed far-fetched, but with what I know now I just have to say
"hmmm...".

A --batch-only flag name doesn't (to me) necessarily imply minimal
network traffic.  However, if you also want it to work as a receiver,
and generate batch files on the receiving side...  Let me think about it
a few days -- the "offline" terminology has had a few days to soak in my
brain; it's going to take time for me to adjust.

In the draft version I was tempted to call this the "Station Wagon"
patch, and name the flag accordingly.  ;-)

> We'll want to change the generator to not do any work other than
> generating checksums.  One way would be to actually set dry-run in the
> generator, and then change the code near the checksum-sending to not
> be skipped if the batch-only mode is enabled.

Ahh -- yep.   I hadn't noticed that directories were still being created
etc. when I had --offline turned on.  Thanks.

> On the sending side, I think the code could be simplified to simply
> output the data directly to the batch fd instead of changing the
> monitor fd and writing to a dummy fd.

Agreed -- and my /dev/null silliness wouldn't have been portable anyway.
I think I was trying to avoid touching writefd(), since so many things
ultimately call it.

> Also, the code should deal with the combination of pulling data with
> this new option.  Since the batch file is created locally, this
> combination cannot save any bandwidth, but it could still be used to
> only create a batchfile without doing any updates.  This means that
> the batch-only option only needs to be sent to the server if the
> server is the receiver, and the receiver would need to just discard
> any received updates when this option is enabled.

Hoo boy.  I was afraid someone would ask for that.  ;-)  Okay, I *think*
I see where the right place is to do this -- somewhere around the
current dry_run check in recv_files()...?  And then writefd() would need
to be toggled from there to write to the correct fd...  Yep, this would
all be much simpler if I just went ahead and refactored writefd().
 
> Thanks for the patch!  If you feel like doing some more work on this,
> feel free.  If not, I'll look at it some more eventually.

I need it desperately myself, so I don't really have a choice.  ;-)
Thanks for the helpful feedback.  I'll keep the list updated, and of
course I'll be glad to see it integrated so I can stop porting it
forward, as soon as everyone's happy with it.  

Steve
-- 
Stephen G. Traugott  (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt at TerraLuna.Org 
http://www.stevegt.com -- http://Infrastructures.Org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.samba.org/archive/rsync/attachments/20050323/5c4ebeb5/attachment.bin


More information about the rsync mailing list