Rsync: Re: patch to enable faster mirroring of large filesystems

Alberto Accomazzi aaccomazzi at cfa.harvard.edu
Fri Nov 30 10:18:40 EST 2001


In message <20011129165217.B7625 at lucent.com>, Dave Dykstra writes:

> On Thu, Nov 29, 2001 at 11:02:07AM -0500, Alberto Accomazzi wrote:
> ...
> > These numbers show that reading the filenames this way rather than using
> > the code in place to deal with the include/exclude list cuts the startup
> > time down to 0 (from 1hr).  The actual sending of the filenames is down
> > from 2h 15m to 1h 40m.  The reason this isn't better is due to the fact
> > that turning buffering on only helps the client, while the server still
> > has to do unbuffered reads because of the way the list is sent across. 
> 
> Are you sure about that?  I don't see any unbuffered reads.

Actually I'm not sure that the code intends to do unbuffered reads,
but that's what's happening for sure from the trussing I've done on 
the server side.  I'm not sure how the buffering should take place 
since the include/exclude file names are sent over the wire one at
a time rather than as a chunk of data, but maybe buffering is done at
a higher level.

> 2.3.2 did have the read_check() hack which was there to prevent SSH pipes
> from getting stuck, maybe that's what you're seeing.  That was taken out
> in 2.4.0 so maybe that would greatly speed it up.

Possible.  Another reason why I don't think it's worth spending any more
time patching 2.3.2 anyways...

> > As far as I can tell there is no way to get around the buffering without
> > a protocol change or a different approach to sending this list.
> > 
> > Given the data above, I think implementing --files-from this way would
> > be the wrong way to go, for a number of reasons:
> 
> I've been starting to think along those lines too.  It should be a protocol
> change to just send the files and not treat it like excludes.  In fact,
> the file list is normally sent from the sender to the receiver, but if
> the client is the receiver maybe we could figure out a way to have
> --files-from only send the list in the other direction.

Right.  The point is that when Tridge wrote the code he was obviously 
envisioning a client sending a short exclude list to the server and then
the server sending a massive list back to the client.  Therefore no
optimization nor compression has ever been included to ensure the fast
trasfer of the exclude list, so patching things this way goes against 
the original design of the protocol.  So probably the best thing to do
is stick the file list right after the exclude list, turning on compression
if -z has been selected and bump up the protocol so that we can be 
backwards compatible.  At least that's my take.

-- Alberto


****************************************************************************
Alberto Accomazzi                          mailto:aaccomazzi at cfa.harvard.edu
NASA Astrophysics Data System                      http://adsabs.harvard.edu
Harvard-Smithsonian Center for Astrophysics        http://cfawww.harvard.edu
60 Garden Street, MS 83, Cambridge, MA 02138 USA   
****************************************************************************




More information about the rsync mailing list