Block size optimization - let rsync find the optimal blocksize by itself.

Olivier Lachambre lachambre at club-internet.fr
Mon Jul 1 09:30:02 EST 2002


At 17:09 30/06/2002 -0700, you wrote:
>Olivier,
>
>>   Well, the first comment: during my work, I wanted to verify that the
>> theorical optimal block size sqrt(24*n/Q) given by Andrew Tridgell in his
>> PHd Thesis was actually the good one, and when doing the tests on randomly
>> generated & modified files I discovered that the size sqrt(78*n/Q) is the
>> actual optimal block size, I tried to understand this by reading all the
>> thesis, then quite a lot of documentation about rsync but I just can't
>> figure out why the theorical & experimental optimal block sizes so much
>> don't match. I _really_ don't think it's coming from my tests, there must be
>> somewhat else.
>
>First off, you need to make sure you are taking into account the
>conditions I mentioned for that optimal size to be correct. In
>particular I assumed:
>
>  If, for example, we assume that the two files are the same except for
>  Q sequences of bytes, with each sequence smaller than the block size
>  and separated by more than the block size from the next sequence
>
>In practice there is no 'correct' model for real files, so I chose a
>simple module that I thought would give a reasonable approximation
>while being easy to analyse.

I did not explain at all what my tests were : I did not use real files
but a randomly generated file in which I have put 1 byte long
differences, separated from another difference by much more than the block
size.

>Also, you didn't take into account that the function I gave was for
>the simpler version of rsync that I introduced in chapter 3. Later in
>the thesis I discuss how s_s can be reduced without compromising the
>algorithm (see 'Smaller Signatures' in chapter 4). That changes the
>calculation of optimal block size quite a bit.

I think this is the main reason for such results in my test.

  Thanks for answering my question.

  Amicalement,

Olivier

P.S. Maybe one day all this will be available somewhere on the Internet
(I don't have time do it now because of exams actually).
_______

Olivier Lachambre
2, rue Roger Courtois
25 200 MONTBELIARD
FRANCE

e-mail : lachambre at club-internet.fr





More information about the rsync mailing list