batch-mode fixes [was: [PATCH] fix read-batch SEGFAULT]

Tue May 18 20:30:27 GMT 2004

On Tue, May 18, 2004 at 11:11:51AM -0400, Alberto Accomazzi wrote:
> 
> Wayne Davison wrote:
> >

<snip>

> >I'm wondering if batch mode should be removed from the main rsync
> >release and relegated to a parallel project?  It seems to me that a
> >better feature for the mainstream utility would be something that
> >optimized away some of the load on the sending system when it is
> >serving lots of users.  So, having the ability to cache a directory
> >tree's information, and the ability to cache checksums for files
> >would be useful (especially if the data was auto-updated as it
> >became stale).  That would make all transfers more optimal,
> >regardless of what files the receiving system started from.
> 
> Firs of all, I have a feeling that the number of people who have 
> *considered* using batch mode is quite small, and those who actually 
> have used in the recent past is certainly an even smaller number (I'm 
> thinking zero, actually).  So removing the functionality from the 

/me hold waves his hand frantically.
One, here.  :-)

> mainstream rsync would not be a problem, in fact I think it would be a 
> good thing.  It doesn't make sense to keep something in the code that is 
> not used and cannot be reliably supported.  Although I applaud Jos's 
> efforts in providing this functionality to rsync, I was surprised to see 

	Jos did that?  Good job!

> it included in the main distribution, especially since it underwent 
> virtually no testing as far as I can tell.
> 
> There's no doubt that caching the file list on the server side would 
> indeed be a very useful feature for all those who use rsyncd as a 
> distribution method.  We all know how difficult it can be to reliably 
> rsync a large directory tree because of the memory and I/O costs in 
> keeping a huge filelist in memory.  This may best be done by creating a 
> separate helper application (say rsyncd-cache or such) that can be run 
> on a regular basis to create a cached version of a directory tree 
> corresponding to an rsyncd "module" on the server side.  The trick in 
> getting this right will be to separate out the client-supplied options 
> concering file selection, checksumming, etc, so that the cache is as 
> general as possible and can be used for a large set of connections so as 
> to minimize the number of times that the actual filesystem is scanned.

	What client options are you thinking will be tricky?  Wouldn't the 
helper app just cache _all_ the metadata for the module, and then rsync would 
query only the subset it needed?  It's not like the client can change the 
checksum stride.  [That would hurt.]

	-chris

> 
> >Such a new feature would probably best be added to an rsync
> >replacement project, though.
> 
> Hmmm... "replacement"?  why not make this a utility that can be run 
> alongsize an rsync daemon?  Or are you thinking of a design for a "new" 
> rsync?
> 
> 
> -- Alberto
> 
> 
> ********************************************************************
> Alberto Accomazzi                      aaccomazzi(at)cfa harvard edu
> NASA Astrophysics Data System                        ads.harvard.edu
> Harvard-Smithsonian Center for Astrophysics      www.cfa.harvard.edu
> 60 Garden St, MS 31, Cambridge, MA 02138, USA
> ********************************************************************
> 
> -- 
> To unsubscribe or change options: 
> http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html