performance with >50GB files

Wayne Davison wayned at samba.org
Tue Jan 10 23:31:07 GMT 2006


On Tue, Jan 10, 2006 at 11:31:08PM +0100, Ren? Rebe wrote:
> Also I found the current code does decide from the receiving-side file
> what blocksize to use.

The idea here is that the only checksum data that get transmitted and
stored in the hash table are those for the blocks in the file on the
receiving side.  The sender's file is read from beginning to end looking
for checksum matches in this receiver-file hash table.  We at least
avoid the problem of over-filling the limited-slot hash table with too
many checksum blocks.  However, it is also true that using a small block
size for a large sender-side file does cause the sender to search for
more block matches than a larger block size, so this does affect
performance.

So, perhaps the size of the sending file should be factored into the
calculation in order to set a minimum acceptable block size.  This would
be easy, because the generator already knows the size of both files at
the time that it performs this calculation (or else it wouldn't have
been able to figure out if the file was up-to-date or not).

..wayne..


More information about the rsync mailing list