[ccache] Questions about two hot functions in ccache

Justin Lebar jlebar at mozilla.com
Sun Oct 17 19:44:44 MDT 2010

Hi, all.

I ran ccache through |perf| on my x64 Linux box today.  In my testcase
(|make clean && perf record -g make| within a subdirectory of the
Firefox tree), there are only four functions that see more than 2% of
the samples:

    25.39%           c++  ccache                             [.]
    10.15%           c++  ccache                             [.] mdfour64
     4.04%           c++  [kernel.kallsyms]                  [k]
     3.14%           c++  ccache                             [.] mdfour_update

So it appears that 13% of my CPU time is spent computing md4 hashes,
while another 25% is spent in hash_source_code_string but outside the
MD4 code.

To someone new to the code like me, it appears that there's some room
for optimization here.

* hash_source_code_string is doing twice as much work as anything else
in ccache, but only to catch edge cases (comments and special macros).
 If it could be simplified, the speed gains might offset the cost of
additional false positives.  If all we really care about is finding
the strings "__DATE__" and "__TIME__", there are faster algorithms
than a character-by-character search.

(Note also that the current implementation copies the whole file into
hashbuf one character at a time.  Again, do the benefits of stripping
out comments really offset this?)

* Why does ccache still use MD4?  Surely there's a better / faster
hash out there.  I noticed that ccache includes murmurhash, but it
doesn't seem like it's used in too many places.  There's probably a
good reason for this, but it's not too apparent to me.

You all probably know better than I if ccache should use a secure hash
function, or if something like murmurhash is sufficient -- a secure
hash function seems like overkill to me, fwiw.  But either way, is
MD4, which on the one hand is no longer a secure hash function, and
which on the other hand I'd imagine is nowhere near as fast as
something like murmurhash, the right function to use?

I'm curious what you all think about this.


More information about the ccache mailing list