Using the rsync checksums for handling large logfiles.

Alun auj at aber.ac.uk
Mon Nov 17 23:06:16 EST 2003


jw schultz (jw at pegasys.ws) said, in message
    <20031117103807.GC9124 at pegasys.ws>:
> 
> The rsync protocol wouldn't lend itself to this.  Overall,
> rsync is going to be less efficient than a utility that
> simply sends what needs to be appended to reach the same
> length.
> 
> I'd have the receiver send the file list with lengths and
> the sender would compare with it's copy and if the file is
> shorter send the whole file with a flag to say "rotated",
> but if the sender's file is longer it would open, seek to
> the receivers length and then send the rest of the file.
> That would be much more efficient than rsync with the block
> checksums, file scanning and copying to tempfiles that get
> renamed.

Dear all,

Thanks for your thoughts.

There are problems with relying purely on lengths (and these are 
what got me thinking about the checksums) - if you rotate and then 
a service gets hit hard before the next sync, it's possible that the 
new file is actually longer than the old was prior to rotation. This 
has happened to us a couple of times where some problem or other has 
caused the updates to not happen as often as they should.

If, instead, the server passes a block checksum (of the last thing 
it has) to the client and the client then works backwards through 
everything it has, looking for a match, then a temporary failure 
can't cause data loss until a log has actually rotated out of 
existence.

It sounds like I'm not missing some subtlety in the use of rsync, so 
I think I'll hack on at implementing my backwards search idea. First 
attempts, mmapping the file in 32Mbyte chunks and using the reversed 
checksum, suggest I can scan backwards at around 13Mbytes/second on 
a 1GHz Pentium.

Cheers,
Alun.

-- 
Alun Jones                       auj at aber.ac.uk
Systems Support,                 (01970) 62 2494
Information Services,
University of Wales, Aberystwyth


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
Url : http://lists.samba.org/archive/rsync/attachments/20031117/f4246412/attachment.bin


More information about the rsync mailing list