Optimizations and other questions for rsync

Wed Oct 16 23:58:00 EST 2002

On Wed, Oct 16, 2002 at 05:00:02PM -0400, Farid Shenassa wrote:
> 1. is there any computational or disk IO difference between the rsync client
> and server (the one that does just the checksum on the block, vs the one
> that does rolling checksum).  Given that I do not have as much cpu on the
> VOS machine, I would like the more expensive side to run on the Windows
> system.  So I need to figure out who should run the daemon and who should
> push vs. pull.

Once the connection is established it doesn't matter whether
you push or pull.  The difference is who sends and who
receives.  The receiver bears the brunt of the work.

> 2. is there a way for rsync to cache previous calculations on checksum, or
> be told that a particular file of  regex filename starname is always
> appended to, so it does not read the entire file?  Basically I have
> processes that constantly append to ann output file on VOS.  I would like to
> mirror these onto the NT machine.     However, I do not want to have rsync
> every few minutes read the entire file.   Choices I see are:
> 	a. tell rsync that the file is append mode, so it just picks up from
> the last block size on the other machine and goes forward
> 	b. rsync is smart enough not to do this on its own
> 	c. rsync can store cached checksum information
> 	d. there is another option that tells rsync to do this that I
> missed.
> 	e. there is an option to tell rsync to basically continue to read
> the file every X interval after it gets to the end without exiting.

There is no way (with rsync) to do any of these things.

> 3. expanding on option 2e.  One possibility would be to run rsync for each
> file being synced and telling it to just sync to the end, then stay in
> memory, and look for file changes or try to read more blocks at the end
> (assuming another process is writing to it), and sync those new blocks.
> This would keep rsync from stopping and having to restart from the
> beginning.  It may however, cause memory issues for large files if it keeps
> the whole checksum in memory?
> 
> Any ideas or other ways to get around this?   Again, question 2/3 are for
> basically syncing open log files to another machine efficiently.  There may
> be another tool out there for this that I'm not aware of, if so, please
> enlighten me so I can stay away.

It sounds like latency is your issue, not efficiency.  Your
description indicates that you want the log file copies to
be kept within a few minutes of the originals.  If you were
talking about UNIX or Linux i'd suggest looking into syslogd
first and then clustering software or distributed
filesystems.

What would work would be a special utility that monitors the
log files and detects appending and then transmits the
appended data to a (remote) daemon that updates a copy.
Your utility and daemon would have to know what to do about
file rotation.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt