[ccache] BSDiff for cache objects

Bogdan Harjoc harjoc at gmail.com
Mon Nov 12 11:09:21 MST 2012


Initial results from a small .ccache (3.0) dir:

- 6476 objects
- 300MB
- probably about 500-1000 compiles/recompiles of around 100 small to large
projects

The test was:
1. Find the candidates for compression, based on: objdump -t | grep " g "
(defined symbols). If two objects had at least 4 symbols defined, and 85%
of them were identical, the files were selected for compression.
2. Run bsdiff on the selected pairs of files, and collect the total raw
size, and the resulting compressed size.

The results are:
4459 out of 6476 files compressed (6099674 -> 629795 bytes)

So roughly 90% compression rate, for a random .ccache folder.

I attached sources for the test (first run "./get-symbols.sh", then
"./find-similar").

Will post results for a more favorable scenario (multiple builds of
different versions of the same project) if time permits.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: find-similar.cpp
Type: text/x-c++src
Size: 3445 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/ccache/attachments/20121112/9f027ef8/attachment.cpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: get-symbols.sh
Type: application/x-sh
Size: 368 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/ccache/attachments/20121112/9f027ef8/attachment.sh>


More information about the ccache mailing list