Using the rsync checksums for handling large logfiles.

jw schultz jw at pegasys.ws
Mon Nov 17 21:38:07 EST 2003


On Fri, Nov 14, 2003 at 02:30:32PM +0000, Alun wrote:
> 
> Dear all,
> 
> I've only just joined this list, but I can't find any mention of this
> idea anywhere else, so I thought I'd just post here before getting too
> deep into programming and possibly reinventing the wheel.
> 
> Here at Aber, we have around 30 unix and linux servers doing core services. 
> Each one is maintaining its own logfiles and, for various reasons, we want to
> keep these on the servers' local disks, with each server having its own log
> rotation policy. We also have a box which acts as a central repository of
> logfiles. A script runs every 10 minutes on this box and pulls whatever's
> been appended to the logfile during the previous 10 minutes across to the
> log server. At the moment, all this is based on a database which remembers
> the old size of the remote logfile and just requests whatever's in the file
> after that point. Things get a bit complicated when you try to allow for the
> remote log rotation and, if anything goes wrong, the database can get out of
> sync with reality, leading to lost log data.
> 
> Now, as I see it, rsync would be a useful tool for this job, except that it
> would mean that we end up scanning a large file on both servers every 10
> minutes. I've played around on paper and come up with a reversed version of
> the rolling checksum (i.e. given a_k, b_k calculate a_k-1 and b_k-1). The
> idea is that I calculate the checksum of the final block of the log on the
> loghost and pass that to the remote host. A program there starts reading
> backwards through the logfile, until it finds a checksum match. It then
> feeds back everything in the file after that point. If rotation has happened
> on the remote host, the client could start reading backwards through the
> rotated log etc. 
> 
> I think this would give me a lot of advantages over our current system (it
> would be very robust, if nothing else). It also seems like something that
> could be added to rsync itself fairly easily. Some sort of flag to say "I
> guarantee that the only differences in the remote file are at its end" 
> could save huge amounts of disk I/O when used for log synchronisation. Does 
> anyone know of anything similar, or of any plans to make rsync capable of 
> reading backwards in this manner? If not, I'll go my own way with my hacked 
> checksum.

The rsync protocol wouldn't lend itself to this.  Overall,
rsync is going to be less efficient than a utility that
simply sends what needs to be appended to reach the same
length.

I'd have the receiver send the file list with lengths and
the sender would compare with it's copy and if the file is
shorter send the whole file with a flag to say "rotated",
but if the sender's file is longer it would open, seek to
the receivers length and then send the rest of the file.
That would be much more efficient than rsync with the block
checksums, file scanning and copying to tempfiles that get
renamed.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt



More information about the rsync mailing list