Query re: rolling checksum algorithm of rsync
c.shoemaker at cox.net
Thu Feb 10 19:07:49 GMT 2005
On Thu, Feb 10, 2005 at 11:36:51AM +0000, Alun wrote:
> I think this is a related question (if not identical) to one I asked some
> time back. If you're synchronising log files, for example, then you may be
> able to guarantee that all changes to the file happen at the end of it.
> Unfortunately, rsync doesn't give you the opportunity to use this extra
> information to save I/O and bandwidth.
> If the log file is e.g. 2Gbytes long and has only had 100Kbytes appended
> since the last rsync, then using --whole-file means 2GBytes of network
> traffic and 2GBytes of disk I/O at either end. Using the checksum means
> 2Gbytes of disk I/O at either end and 100Kbytes of network traffic (plus the
> checksum data). Neither is ideal.
> I suspect it wouldn't fit inside the rsync protocol, but I'd like to see
> something that says "start working backwards from the end of the file until
> you find n matching blocks, then transfer from that point onwards". It would
> let me get rid of some horrible hacky code here!
> Would it be useful to be able to tell rsync "assume the first n Kbytes of
> the files at either end are identical and not useful for checksum purposes"?
Probably not. I suspect even what you describe wouldn't give you what
you want. How would you reliably choose n?
The fundamental problem here is that you're trying to treat different
parts of a file as if they had different modification dates. That
won't work. Break the file into pieces, and rsync will work.
Otherwise, you have your own custom database, so you need your own
synchronization methods. You can't expect rsync to work (well) in
> Alun Jones auj at aber.ac.uk
> Systems Support, (01970) 62 2494
> Information Services,
> University of Wales, Aberystwyth
> To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
More information about the rsync