[ccache] [Patch] Faster direct-mode hash

Joel Rosdahl joel at rosdahl.net
Mon Nov 8 14:38:10 MST 2010


On 2010-11-07 22:13, Justin Lebar wrote:
> This patch is a followup to the discussion in "Questions about two hot
> functions in ccache".

Splendid! Thanks. The speedup factor on my machine is about 1.5.

> I suspect we could use the fast_hash function for preprocessor mode
> without much work.  I also suspect that switching to a smarter
> algorithm for searching for "#include" would decrease the cost of
> cache misses.  But I haven't profiled either of these cases.

Yes, that would be interesting to investigate.

> I'm a bit concerned about the fact that I had to change the reported
> file lengths in the manifest test (in test.sh).  I'm not sure what's
> going on here; I may well have messed something up.  Hopefully not.
> :)

Those sizes are not file lengths but size of the hashed content, so a 
change is expected since you changed the number of MD4-hashed bytes.

The improved search for __{DATE,TIME}__ is uncontroversial, so that can 
be applied right away. However, I would like to make the LFG-based 
digest opt-in, at least for now, since I think we need time to test it 
and to collect hash-savvy people's opinions.

By the way, can you provide some reference to why LFG (and the 
properties you chose) would work well as a digest for ccache's purpose? 
What's the expected collision rate? Or in other words: how well can we 
sleep at night, knowing that we haven't messed up people's builds, if we 
would introduce the LFG-based algorithm? :-)

My plan for ccache 3.2 is to work on configurability by introducing a 
config file. (I will post some thoughts on this to the list later on.) 
It would also be nice to work on making it possible to choose hash 
algorithm, say between MD4 and your LFG-based digest (and other 
alternatives people want to implement).

-- Joel


More information about the ccache mailing list