Query re: rolling checksum algorithm of rsync

Wed Feb 9 07:31:59 GMT 2005

Hi,

I had a query wrt the topic of rsync's rolling checksum algorithm:

If I have a fileA that is a database file of size 100 MB on local machine.
I back it up first time (full backup) using rsync to the server assuming block_size to be 30 KB and --compress option to compress data as it is transferred.

Next time, I modify the fileA with another 100 MB new contents towards the end (assuming my database appends that new data towards the end of the physical fileA)

I now again run rsync on my fileA to back it up to server. rsync performs an incremental backup on that fileA using its rolling checksum and when a match is found it verifies the match using the stronger checksum.

Query is that during the rolling checksum algorithm, the initial 100 MB is found to be matching and hence it does not really transfer those blocks on the network. However, when it encounters the newly appended 100 MB towards the end of physical fileA, it starts rolling to see if it can find a matching block in the hashtable or till it hits block_size and then again repeats the rolling checksum process. 

Since all the contents are new, it goes on rolling and hence does not find a match and hence the literal data is transmitted over the network.

Does this rolling for every byte addition and removal process slow down the speed of rsync and cause any sort of a latency in incremental backups and if not, how has this case been handled within match.c or any other associated file?

Should block_size be modified for varying file sizes to optimize the above condition?

Any help is appreciated.

Thanks in anticipation,

Regards,
Naveen

---------------------------------
Do you Yahoo!?
 Yahoo! Search presents - Jib Jab's 'Second Term'
-------------- next part --------------
HTML attachment scrubbed and removed