[ccache] Caching failed compilations

Wed Jul 8 13:44:30 UTC 2015

On 08/07/15 14:04, Joel Rosdahl wrote:
> On 7 July 2015 at 10:58, Andrew Stubbs <ams at codesourcery.com
> <mailto:ams at codesourcery.com>> wrote:
>  > On 06/07/15 21:44, Joel Rosdahl wrote:
>  > > But maybe writing some special content to the object file would be OK?
>  >
>  > OK, fair enough, but I'd say that once you've opened the file and checked
>  > the magic data then you've already killed performance.
>
> On a cache miss, the object file doesn't exist, so it doesn't need to be
> opened. On a cache hit, we need to open and read the file regardless of
> whether it's a real object file or special data encoding an exit code.
> In what way would this kill performance?

On cache-hit, there's currently no reason to actually look inside the 
file, right? It just does the copy blind (I forget exactly how). Reading 
the initial data from every binary on every cache-hit (the case we want 
to be most optimal) sounds like a Bad Thing.

>> A failure can be confirmed by a read, if and only if the length matches, but
>> a compile success will remain on the quick path.
>
> You lost me there. :-) I don't understand what you think would be a slow
> path. Please expand on this.

The most common case must always be the "quick path"; i.e. we should try 
to ensure that we can achieve that with the fewest file-stats, opens, 
reads, etc. Any other case must needs be slower (because it requires 
more decision making), and we should ensure that those extra decisions 
are not pushed up into the quick path.

So, if the cached binary has some property that says "this isn't the 
most common case", then we need to be able to identify that with as 
little additional overhead as possible. The cost of system calls 
massively dwarfs the cost of simple logic comparisons, so an optimal 
solution would use an indicator that is already available.

For example, if the binary file does not exist then we don't need an 
extra system call to figure out that we don't have a plain old fashioned 
cache-hit.

For another example, if the binary file has a very specific size then we 
can see that from the stat call the code already does (at least, I think 
it does). The file size itself could be just a coincidence, so in this 
case we'd have to read the file to check for the magic text. However, 
since this code is the unlikely case -- the slow path -- this is ok.

> For the standard code paths, yes (barring bugs), but e.g. when doing
> cleanup it has no information about which files to expect so it has to
> try to delete all known file types for a given result.

The performance of cleanup are not important (within reason).

Primary importance is how quickly we can speed through a build 
consisting entirely of cache-hits.

Secondary importance is keeping the overhead of a build consisting 
entirely of cache-misses to a minimum.

Everything else is the unlikely case, and therefore need only be "not 
terrible". :-)

Andrew