Query re: rolling checksum algorithm of rsync

Thu Feb 10 11:36:51 GMT 2005

Wayne Davison (wayned at samba.org) said, in message
    <20050210084012.GC31302 at blorf.net>:
> 
> > Does this rolling for every byte addition and removal process slow
> > down the speed of rsync and cause any sort of a latency in incremental
> > backups
> 
> I'm not sure what you mean by "latency in incremental backups", though,
> so I don't know if that fully answers your question.

I think this is a related question (if not identical) to one I asked some
time back. If you're synchronising log files, for example, then you may be
able to guarantee that all changes to the file happen at the end of it.
Unfortunately, rsync doesn't give you the opportunity to use this extra
information to save I/O and bandwidth.

If the log file is e.g. 2Gbytes long and has only had 100Kbytes appended
since the last rsync, then using --whole-file means 2GBytes of network
traffic and 2GBytes of disk I/O at either end. Using the checksum means
2Gbytes of disk I/O at either end and 100Kbytes of network traffic (plus the
checksum data). Neither is ideal.

I suspect it wouldn't fit inside the rsync protocol, but I'd like to see
something that says "start working backwards from the end of the file until
you find n matching blocks, then transfer from that point onwards". It would
let me get rid of some horrible hacky code here!

Would it be useful to be able to tell rsync "assume the first n Kbytes of
the files at either end are identical and not useful for checksum purposes"?

Cheers,
Alun.

-- 
Alun Jones                       auj at aber.ac.uk
Systems Support,                 (01970) 62 2494
Information Services,
University of Wales, Aberystwyth