Block size optimization - let rsync find the optimal blocksize by itself.

tridge at samba.org tridge at samba.org
Sun Jun 30 17:11:02 EST 2002


Olivier,

>   Well, the first comment: during my work, I wanted to verify that the
> theorical optimal block size sqrt(24*n/Q) given by Andrew Tridgell in his
> PHd Thesis was actually the good one, and when doing the tests on randomly
> generated & modified files I discovered that the size sqrt(78*n/Q) is the
> actual optimal block size, I tried to understand this by reading all the
> thesis, then quite a lot of documentation about rsync but I just can't
> figure out why the theorical & experimental optimal block sizes so much
> don't match. I _really_ don't think it's coming from my tests, there must be
> somewhat else.

First off, you need to make sure you are taking into account the
conditions I mentioned for that optimal size to be correct. In
particular I assumed:

  If, for example, we assume that the two files are the same except for
  Q sequences of bytes, with each sequence smaller than the block size
  and separated by more than the block size from the next sequence

In practice there is no 'correct' model for real files, so I chose a
simple module that I thought would give a reasonable approximation
while being easy to analyse.

Also, you didn't take into account that the function I gave was for
the simpler version of rsync that I introduced in chapter 3. Later in
the thesis I discuss how s_s can be reduced without compromising the
algorithm (see 'Smaller Signatures' in chapter 4). That changes the
calculation of optimal block size quite a bit.

Thanks for looking at this though. I haven't thought closely about
this algorithm in a long time! 

Cheers, Tridge




More information about the rsync mailing list