batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]
Alberto Accomazzi
aaccomazzi at cfa.harvard.edu
Mon May 17 14:15:23 GMT 2004
Chris,
to put things in the right prespective, you should read (if you haven't
done so already) the original paper describing the design behind batch
mode. The design and implementation of this functionality goes back to
a project called the Internet2 Distributed Storage Infrastructure
(I2-DSI). As part of that project, the authors created a modified
version of rsync (called rsync+) which had the capability of creating
these batch sets for mirroring. Here are a couple of URLs describing
the ideas and motivation behind it:
http://www.ils.unc.edu/i2dsi/unc_rsync+.html
http://www.ils.unc.edu/ils/research/reports/TR-1999-01.pdf
Chris Shoemaker wrote:
> Yes, I think you're right about the original design. And I guess we'd
> want to preserve that capability. Or would we?
> I'm having a little trouble seeing why this was the intended
> use. I figure, there are three cases:
>
> A) If you have access to both source and dest, it doesn't really matter too
> much who writes the batch -- this is like the local copy case.
> B) If you have access to the dest but not the source, then you need the
> client to write the batch -- and it's not far-fetched that you might have
> other copies of dest to update.
> C) However, having access to source but not dest is the only case that
> _requires_ the sender to write the batch -- now what's the chance that you'll
> have another identical dest to apply the batch to? And if you did, why
> wouldn't you generate the batch on that dest as in case A, above?
>
> So, it seems to me that it's much more useful to have the receiver/client
> write the batch than sender/client, or receiver/server, or sender/server.
> But, maybe I'm just not appreciating what the potential uses of batch-mode
> are.
>
> Survey: so who uses batch-mode and what for?
I haven't used the feature but back when I read the docs on rsync+ I
thought it was a clever way to do multicasting on the cheap. I think
the only scenario where batch mode makes sense is when you need to
distribute updates from a particular archive to a (large) number of
mirror sites and you have tight control on the state of both client and
server (so that you know exactly what needs to be updated on the mirror
sites). This ensures that you can create a set of batch files that
contain *all* the changes necessary for updating each mirror site.
So basically I would use batch mode if I had a situation in which:
1) all mirror sites have the same set of files
2) rsync is invoked from each mirror site in exactly the same way (i.e.
same command-line options) to pull data from a master server
then instead of having N sites invoke rsync against the same archive, I
would invoke it once, make it write out a set of batch files, then
transfer the batch files to each client and run rsync locally using the
batch set. The advantage of this is that the server only performs its
computations once. An example of this usage would be using rsync to
upgrade a linux distribution, say going from FC 1 to FC 2. All files
from each distribution are frozen, so you should be able to create a
single batch which incorporates all the changes and then apply that on
each site carrying the distro.
The question of whether the batch files should be on the client or
server side is not easy to answer and in the end depends on exactly what
you're trying to do. In general, I would say that since the contents of
the batch mode depend on the status of both client and server, there is
not a "natural" location for it.
-- Alberto
********************************************************************
Alberto Accomazzi aaccomazzi(at)cfa harvard edu
NASA Astrophysics Data System ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics www.cfa.harvard.edu
60 Garden St, MS 31, Cambridge, MA 02138, USA
********************************************************************
More information about the rsync
mailing list