efficient file appends

David Bolen db3l at fitlinxx.com
Thu Dec 13 10:28:54 EST 2001

rsync at ka9q.net [rsync at ka9q.net] writes:

> It seems to me that this situation is common enough that the rsync
> protocol should look for it as a special case. Once the protocol has
> determined from differing timestamps and/or lengths that a file needs
> to be synchronized, the receiver should return a hash (and length) of
> its copy of the entire file to the sender.  The sender then computes
> the hash for the corresponding leading segment of its copy. If they
> match, the sender simply sends the newly appended data and instructs
> the receiver to append it to its copy.

While potentially a useful option, you wouldn't want the protocol to
automatically always check for it, since it would preclude rsync on
the sending side from being able to use part of the original file when
transmitting the newly added data to the receiver.  While perhaps not
helpful for log files, it can be a big win for other files, even if
the current copy on the receiver matches the sender's initial portion.
So at best, you'd only want to enable this option if the only thing
for the entire set of files in a given run were files known to expand
this way.

Alternatively, even with rsync the way it is today, what I do is
manually bump up the blocksize to something large (say 16 or 32K).
This results in far fewer blocks for the checksum algorithm (from
perhaps 10-45x depending on original file size based on the default
dynamic blocksize selection) and thus minimizes the meta data
transmitted for the common portion of the file.  It works pretty well
for me with database transaction log files which get pretty big.  You
can probably find some past e-mail on the subject in the list by
looking for threads about rsync blocksize.

-- David

 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \

More information about the rsync mailing list