Mass software update distribution + checksum-updating

Fri Jul 13 23:03:11 GMT 2007

Hi Wayne,

Thanks for your response - it's much appreciated. Comments below

Wayne Davison wrote:
> The only checksum that is being cached is the one that the user can
> optionally request for a pre-transfer check.  It's not usually needed,
> unless the "quick check" algorithm (size + mtime) has a chance of being
> wrong.
>   

In our case, the mtime is going to be different, since users would be 
installing a game from a CD and they would have an mtime from that 
initial install, so checksums would be needed.

> A better update strategy would be some kind of a binary patch algorithm.
>
> You may want to check into some other binary-patching software to see
> what your options are (I haven't looked into it).
>   

We've looked briefly at rsync's batch mode, but using that would likely 
be pretty similar to several other binary-patching solutions out there, 
and we'd have to go through the complexity of dealing with updates from 
multiple different source versions, which would add development work I 
was hoping we might avoid by just using rsync directly.

>> Lastly, does anyone have any empirical data on how well an rsync server 
>> with checksum-updating works with large number (eg: hundreds to 
>> thousands) of simultaneous clients?
>>     
>
> Not that I know of.  For really large files, that is likely to be quite
> a memory and CPU hog.  Each client will be sending you checksum data for
> the whole file, and then the server will be doing its own checksumming
> and block comparisons using this in-memory checksum cache.
>   

I'm a bit confused still on this last point - would the cached checksums 
from the checksum-updating patch mean that the server would only have to 
be doing the block comparisons?  Or would the server still need to 
calculate the checksums themselves for every client?  IE: are the 
individual block checksums within a file cached by the checksum-updating 
patch, or is it just caching an overall file checksum?

Also, is it the server that does the block comparisons and decides what 
data to send, or does that happen on the client?  If it's the server, 
that would certainly be a bunch more overhead than I was thinking.  From 
the "How rsync works" document 
(http://samba.anu.edu.au/rsync/how-rsync-works.html), it sounded like 
the receiver (aka client) became the 'generator'.  I guess I was 
thinking that the generator was responsible for requesting the 
individual blocks.  A re-read suggests that it is in fact the server 
that has to do the block comparisons as you seem to be suggesting.

Wouldn't it be more efficient in general for that to happen on the 
client side though?  One side certainly has to transfer the block 
checksums over to the other side, so why not make that be the server 
rather than the client and have the client do the block comparisons and 
then request individual blocks from the server?

Take care,
 -Gav

-- 
Gavriel State, Founder & CTO
TransGaming Inc.
gav at transgaming.com
http://www.transgaming.com

Broadening The Playing Field