Mass software update distribution + checksum-updating
Gavriel State
gav at transgaming.com
Fri Jul 13 23:03:11 GMT 2007
Hi Wayne,
Thanks for your response - it's much appreciated. Comments below
Wayne Davison wrote:
> The only checksum that is being cached is the one that the user can
> optionally request for a pre-transfer check. It's not usually needed,
> unless the "quick check" algorithm (size + mtime) has a chance of being
> wrong.
>
In our case, the mtime is going to be different, since users would be
installing a game from a CD and they would have an mtime from that
initial install, so checksums would be needed.
> A better update strategy would be some kind of a binary patch algorithm.
>
> You may want to check into some other binary-patching software to see
> what your options are (I haven't looked into it).
>
We've looked briefly at rsync's batch mode, but using that would likely
be pretty similar to several other binary-patching solutions out there,
and we'd have to go through the complexity of dealing with updates from
multiple different source versions, which would add development work I
was hoping we might avoid by just using rsync directly.
>> Lastly, does anyone have any empirical data on how well an rsync server
>> with checksum-updating works with large number (eg: hundreds to
>> thousands) of simultaneous clients?
>>
>
> Not that I know of. For really large files, that is likely to be quite
> a memory and CPU hog. Each client will be sending you checksum data for
> the whole file, and then the server will be doing its own checksumming
> and block comparisons using this in-memory checksum cache.
>
I'm a bit confused still on this last point - would the cached checksums
from the checksum-updating patch mean that the server would only have to
be doing the block comparisons? Or would the server still need to
calculate the checksums themselves for every client? IE: are the
individual block checksums within a file cached by the checksum-updating
patch, or is it just caching an overall file checksum?
Also, is it the server that does the block comparisons and decides what
data to send, or does that happen on the client? If it's the server,
that would certainly be a bunch more overhead than I was thinking. From
the "How rsync works" document
(http://samba.anu.edu.au/rsync/how-rsync-works.html), it sounded like
the receiver (aka client) became the 'generator'. I guess I was
thinking that the generator was responsible for requesting the
individual blocks. A re-read suggests that it is in fact the server
that has to do the block comparisons as you seem to be suggesting.
Wouldn't it be more efficient in general for that to happen on the
client side though? One side certainly has to transfer the block
checksums over to the other side, so why not make that be the server
rather than the client and have the client do the block comparisons and
then request individual blocks from the server?
Take care,
-Gav
--
Gavriel State, Founder & CTO
TransGaming Inc.
gav at transgaming.com
http://www.transgaming.com
Broadening The Playing Field
More information about the rsync
mailing list