[ccache] BSDiff for cache objects

Bogdan Harjoc harjoc at gmail.com
Mon Nov 12 05:56:30 MST 2012


On Mon, Nov 12, 2012 at 2:30 PM, Jürgen Buchmüller <pullmoll at t-online.de>wrote:

> Am Montag, den 12.11.2012, 13:49 +0200 schrieb Bogdan Harjoc:
> > Basically, before writing a new object file, ccache could find a similar
> > object in the cache (based on object-code or source-code hashes for
> > example)
>
> The main goal of most hashes is to give very distinct results even for
> even small changes in the input data, which is why there is not really
> an algorithm to compare two files' similarity based on hashes.
>

I should have been more specific. I meant block-hashes, like rsync and
bsdiff do:
http://www.samba.org/~tridge/phd_thesis.pdf

The savings in size are
> probably less important than the expectable performance loss for
> building deltas of source and/or object files.
>

My concern as well. But an offline "ccache-compact" that runs every 24h or
so, possibly only creating the "100 hashes" once for every new file, should
be pretty fast. And applying a bspatch requires a bunzip2 and going through
a list of INSERT/ADD instructions. It can probably be approximated to just
"bunzip2". There is also xdelta which is faster.


More information about the ccache mailing list