[ccache] BSDiff for cache objects

Bogdan Harjoc harjoc at gmail.com
Fri Nov 16 04:04:17 MST 2012


On Mon, Nov 12, 2012 at 8:09 PM, Bogdan Harjoc <harjoc at gmail.com> wrote:

> Initial results from a small .ccache (3.0) dir:
>

The previous results were bogus (I was diffing .gz compressed objects).

I tried the bsdiff approach on a new .ccache folder obtained from compiling
8 linux kernels (various v3.x versions) configured as allnoconfig.

Stats:
- 22MB
- 4030 objects (gzipped)
- 503 cache hits

After applying diffs:
- 8.4MB (1862 files) were selected for diffs
- 1.5MB (17%) compressed patches resulted
- applying the 1862 diffs took 1.21 sec (so 0.65ms per patch)

I got the 0.65ms figure using a script that runs gunzip -c and bspatch for
each patched file in the cache. To find out and subtract the overhead of
creating the gunzip and bspatch processes, I ran the loop with just
"gunzip; bspatch --help > null" instead of the normal "gunzip; bspatch". So
the result should cover just the actual patch applying (ccache does the
gunzip anyway).

The test was done on a 3.4Ghz i7-2600.

My conclusions are:
- performance impact should be negligible (when compressing files offline
-- e.g. when make aborts with "no space left on device")
- savings in size vary (7MB out of 22MB in this case)
- the patches will need to have the "old filename" embedded

If people post results from their cache folders and there is interest, I
can work on an implementation.

Attached are the updated sources for the test. I normally run them in a
"testccache" directory, since they create two more folders with data.

(I posted this mail privately just to Andrew by mistake 3 days ago,
reposting it to ccache-list now)

Cheers,
Bogdan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bspatch-all.sh
Type: application/x-sh
Size: 182 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/ccache/attachments/20121116/58987c7a/attachment.sh>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: find-similar.cpp
Type: text/x-c++src
Size: 4944 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/ccache/attachments/20121116/58987c7a/attachment.cpp>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: get-symbols.sh
Type: application/x-sh
Size: 368 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/ccache/attachments/20121116/58987c7a/attachment-0001.sh>


More information about the ccache mailing list