Rsync for program loading on embedded platforms

Wed Jun 2 09:47:45 GMT 2004

G'day,

From: "Greger Cronquist" <greger_cronquist at yahoo.se>
[...]
> >compiled binaries are often very different for only minor source
> >changes. It would be worth analysing your data to see if there are more
[...]
> Is that really true---that the binaries differ that much? Isn't that
> mostly due to relocating the different functions to other areas of the
> binary? Which, I guess, might be hard for in-place rsyncing. I just did
> a quick test with two different binaries using xdelta and rdiff and the
> uncompressed deltas were less than 10% of the original size (xdelta of
> course being the smallest). So I have hopes that some kind of
> diff-related (even if it means keeping the old binary on a PC which we
> do anyway for tracability reasons) might work. Depending, of course on
> the overwrite and cpu issues.

This depends a lot on the compiler and options used. The more "optimized"
the compiler, the more you often find small source changes resulting in
significant binary changes. Stuff written in assembler usually doesn't
change much though... unless you change some small macro that is used all
over the place.

> >The checksum size you can use is a function of the data size, block
> >size, and your acceptable failure rate. There are threads where this has
> >been analysed and the latest rsync incorporates dynamic checksum and
[...]
> Please correct me if I'm wrong, but don't most of these threads deal
> with finding the optimal block size depending on the file size? For an

No, there are threads where the optimal blocksum size is discussed.

> embedded system we might well use small blocks just to be able to use,
> say, a 16 bit checksum that is much faster to compute. Or even 8 bit
> checksums for that matter. If I understand it correctly, rsync only ever
> uses one (well two if counting both the rolling and the md4 checksums)
> checksum implementation varying the block size it's calculated on.

There is a relationship between the block size, file size, and blocksum size
that can be used to calculate the probability of "failure" (ie, encountering
two blocks with the same blocksum, corrupting the result). Given an
acceptable failure rate (say, 1 in 2^10 file transfers) and file size, you
can pick a block size and thus calculate your blocksum size. eg, for a small
file;

file_len = 64K
block_len = sqrt(file_len) = 256 bytes
blocksum_bits = 10 + 2*log2(file_len) - log2(block_len) = 34 bits

The 32 bit rollsum alone is not enough to give a better than 1 in 2^10
chance of failure... you need at least two more bits of checksum, which you
may as well round up to the nearest byte. Also, the rollsum is not a very
good checksum, so relying on it to provide a good 32 bits of the checksum
might not be wise... particularly if you start taking into account
maliciously crafted blocks.

> As of yet, this is more of an academic interest spurred by the annoying
> delays in the compile-download-test loop... :-)

Ahh... remember them well :-)

I distinctly remember one particular system that attempted to minimize the
download-test loop by doing block-wise checksums... most of the time it
didn't save you much, and rsync would have been a distinct improvement.

----------------------------------------------------------------
Donovan Baarda                http://minkirri.apana.org.au/~abo/
----------------------------------------------------------------