Using the rsync checksums for handling large logfiles.

Alun auj at aber.ac.uk
Sat Nov 15 01:30:32 EST 2003


Dear all,

I've only just joined this list, but I can't find any mention of this
idea anywhere else, so I thought I'd just post here before getting too
deep into programming and possibly reinventing the wheel.

Here at Aber, we have around 30 unix and linux servers doing core services. 
Each one is maintaining its own logfiles and, for various reasons, we want to
keep these on the servers' local disks, with each server having its own log
rotation policy. We also have a box which acts as a central repository of
logfiles. A script runs every 10 minutes on this box and pulls whatever's
been appended to the logfile during the previous 10 minutes across to the
log server. At the moment, all this is based on a database which remembers
the old size of the remote logfile and just requests whatever's in the file
after that point. Things get a bit complicated when you try to allow for the
remote log rotation and, if anything goes wrong, the database can get out of
sync with reality, leading to lost log data.

Now, as I see it, rsync would be a useful tool for this job, except that it
would mean that we end up scanning a large file on both servers every 10
minutes. I've played around on paper and come up with a reversed version of
the rolling checksum (i.e. given a_k, b_k calculate a_k-1 and b_k-1). The
idea is that I calculate the checksum of the final block of the log on the
loghost and pass that to the remote host. A program there starts reading
backwards through the logfile, until it finds a checksum match. It then
feeds back everything in the file after that point. If rotation has happened
on the remote host, the client could start reading backwards through the
rotated log etc. 

I think this would give me a lot of advantages over our current system (it
would be very robust, if nothing else). It also seems like something that
could be added to rsync itself fairly easily. Some sort of flag to say "I
guarantee that the only differences in the remote file are at its end" 
could save huge amounts of disk I/O when used for log synchronisation. Does 
anyone know of anything similar, or of any plans to make rsync capable of 
reading backwards in this manner? If not, I'll go my own way with my hacked 
checksum.

Cheers,
Alun.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
Url : http://lists.samba.org/archive/rsync/attachments/20031114/4f1232dd/attachment.bin


More information about the rsync mailing list