Query re: rolling checksum algorithm of rsync
auj at aber.ac.uk
Fri Feb 11 11:08:45 GMT 2005
Chris Shoemaker (c.shoemaker at cox.net) said, in message
<20050210190749.GA9297 at cox.net>:
> > If the log file is e.g. 2Gbytes long and has only had 100Kbytes appended
> > since the last rsync, then using --whole-file means 2GBytes of network
> > traffic and 2GBytes of disk I/O at either end. Using the checksum means
> > 2Gbytes of disk I/O at either end and 100Kbytes of network traffic (plus the
> > checksum data). Neither is ideal.
> use logrotate.
I'm aware of things like logrotate, but if I have to rotate the logs every
hour on each of my webcache servers so that rsync will perform well, then I
can't really afford to do it. I'd end up keeping 144 logfiles per day on
the logging server just to make rsync efficient.
Similarly, remote syslog wouldn't tackle it since not all the services for
which we need to collate logs even use syslog.
At the moment, I have a script which runs every 10 minutes and just copies
over the tail of the logfile, using the current size on the logging server
as its start point. This works OK, but it's yet another custom service to
We already use rsync widely for other purposes on these servers and a patch
like I mentioned would allow us to use it for this extra job too.
I know it's forcing rsync to do something that doesn't make sense in the
general case, but in the specific case of files which are almost always
appended, it could be a gain.
> Probably not. I suspect even what you describe wouldn't give you what
> you want. How would you reliably choose n?
For my application, I could use:
n = max("current size of file on logging server minus 1Mbyte", 0)
> You can't expect rsync to work (well) in that case.
I don't have any problem carrying on using my current method. All I was
trying to do was clarify what the original poster may have been asking.
Alun Jones auj at aber.ac.uk
Systems Support, (01970) 62 2494
University of Wales, Aberystwyth
More information about the rsync