[ccache] Optimizing MD4

Andrew Stubbs ams at codesourcery.com
Fri Dec 11 10:16:11 UTC 2015


On 10/12/15 17:16, Anders Björklund wrote:
> Andrew Stubbs wrote:
>> Most of the rest of the time is spent doing MD4. I have some ideas how
>> to optimize that (by sharing them across runs), but nothing ready to post.
>
> I would be interested in your thoughts on how to speed that part up.

My implementation, which does a bunch of other things besides, hence why 
it's not fit to post[*], launches a background task which creates a unix 
domain socket in the cache directory (the windows version uses plain old 
TCP).

Each invocation of ccache then connects to that socket and asks the 
daemon to do the MD4 scan on its behalf. The daemon checks the mtime on 
the file and serves the MD4 from its memory cache if nothing has 
changed. The stat call could probably be optimized away if the cache is 
very fresh (<1s?)

The daemon is single-threaded, but I still found a useful speed-up after 
all the headers had been scanned. For many projects its basically the 
same set of maybe 20 headers that get scanned over, and over again. I 
think the hashing of non-header files is not repetitive, and probably 
best handled in ccache itself.

The daemon exits after 10 minutes of inactivity.

In theory, what you get is ccache spending less time in MD4, but more 
time in I/O wait. It does seem to be faster, over all, but that might 
depend on your hardware.

However, even if the latency of each ccache invocation is the same, the 
fact that they're basically idle means you can usefully crank up the 
parallelism for all but the initial build.

You could, in principle, use this communication to limit how many 
cache-miss compilations are permitted to run in parallel, and therefore 
run "make -j" for maximum parallelism without fear of melting your memory.

Unfortunately, I've moved on to other projects and don't have much time 
to work on this stuff any more.

Andrew



[*] The implementation was included in some editions of the Sourcery 
CodeBench toolchain, so you can find them in the source package if you 
really want to. I think you can find it here: 
https://sourcery.mentor.com/GNUToolchain/release3047 (free registration 
required).



More information about the ccache mailing list