Mass software update distribution + checksum-updating

Tue Jul 10 03:26:12 GMT 2007

On Mon, Jul 09, 2007 at 02:13:36AM -0400, Gavriel State wrote:
> In particular, the checksum-updating patch looks like it might be able
> to solve our biggest concerns about CPU load on the update server,
> since the actual content being served will change quite rarely.

The only checksum that is being cached is the one that the user can
optionally request for a pre-transfer check.  It's not usually needed,
unless the "quick check" algorithm (size + mtime) has a chance of being
wrong.

A better update strategy would be some kind of a binary patch algorithm.
Since the user should be starting with a limited set of initial files,
you only need a limited set of updates.  One way to do this with rsync
is to use its batch processing.  A batch saves off the data that was
used to update a file to a new version.  You could have deployed an
update script that identifies what version of the program they have,
checks the server to see what the latest version is, and then downloads
a batch file for changing the old version into the new (applying it via
rsync's batch processing).  As long as you have a copy of each released
version on the server, it would be easy to create these update files via
the --only-write-batch=NAME option when a new version was released.

I could even imagine a custom rsync server that used the data from the
generator to identify which version of a file the user had and to choose
which pre-recorded data stream to send to the user to effect the update
instead of computing the binary patch "live".

You may want to check into some other binary-patching software to see
what your options are (I haven't looked into it).

> Would an rsync server running 3.0 CVS + the checksum-updating patch 
> still retain the precomputed checksum advantage when talking to an older 
> 2.6.9 client?

Sure, it works when talking to any rsync client.

> Alternatively, would it be difficult to backport the checksum-updating 
> patch to a 2.6.9 server?

It wouldn't be difficult.  The checksum-xattr patch would be even easier
to port (as long as you have xattrs on your server) since it doesn't
even need an rsync with xattr support (it just needs to use an extended
attribute read function).

> Lastly, does anyone have any empirical data on how well an rsync server 
> with checksum-updating works with large number (eg: hundreds to 
> thousands) of simultaneous clients?

Not that I know of.  For really large files, that is likely to be quite
a memory and CPU hog.  Each client will be sending you checksum data for
the whole file, and then the server will be doing its own checksumming
and block comparisons using this in-memory checksum cache.

..wayne..