[ccache] direct mode design bug

Wed Nov 7 12:19:37 MST 2012

Many thanks for the answer!

On 5 November 2012 14:53, Andrew Stubbs <ams at codesourcery.com> wrote:

> My first reaction to this issue, rightly or wrongly, is that it's more of
> a documentation issue than a real bug. I mean, it can only occur if two
> people share a cache, or if the user installs new software and then reuses
> an old cache.
>

It can happen in other cases as well. Contrieved example, but still:

rm -rf subdir file.c config.h
echo '#include "config.h"' >file.c
mkdir subdir
echo '#warning subdir/config.h used' >subdir/config.h
sleep 1
ccache gcc -Isubdir -c file.c
# User: "Oops, forgot to create ./config.h."
echo '#warning config.h used' >config.h
sleep 1
ccache gcc -Isubdir -c file.c
# User: "Wat? Why isn't ./config.h used?"

For a real life example, see
https://bugzilla.samba.org/show_bug.cgi?id=8424#c0.

If the documentation simply says that you have to wipe your cache whenever
> you do that sort of thing then does that solve the problem?
>

It would be nice if ccache were only used and enabled by conscious users
who have read and understood the documentation, but in reality that doesn't
happen in many cases. For instance, Linux distributions like Fedora install
and enable ccache by default (masquerading the system compiler), at least
when installing the development environment or similar. That's not
surprising given that ccache works very well for most people and that it is
advertised as being very safe.

There are several other cases where ccache's behavior doesn't fully match
that of the real compiler - I'm just a bit worried that the direct mode
issue we're discussing perhaps is too much of a behavior mismatch.

Hm. Coming to think of it, nothing stops Fedora et al from disabling direct
mode by default even if ccache's own default is to enable it.

A similar issue, albeit not so interesting, perhaps, is what happens when a
> user changes some part of the toolchain, but does not alter the "gcc"
> binary. Ccache won't notice a new back-end compiler, a new assembler, a new
> linker, a new default specs file or anything like that. Chances are that
> any differences in the output are harmless, but the cached objects are
> technically invalid.

Right. However, isn't the the fact that ccache may be affected by toolchain
changes much less surprising than the fact that ccache may fail to pick up
header files correctly?

> [In fact, I have a use-case in which I have multiple users sharing a
> cache, and I wanted to be able to uniquely identify the same toolchain
> across all the installations. The mtime etc. varies from machine to
> machine, as might the exact tool mix, so I have some local patches to do a
> much deeper hash of the toolchain binaries, and include those in the object
> hashes. Even then, in the interests of performance, those toolchain IDs are
> cached according to the location and mtime, so changing the binutils will
> cause temporarily wrong toolchain hashes. Would you be interested in such a
> feature upstream?]

Perhaps, it depends on how intrusive it is and how toolchain-specific it is.

3. ccache could try to imitate what the preprocessor does.
>>
>
> Yuck. If you can program a faster preprocessor I'm sure the GCC folks
> would love to see it.

Thankfully, my suggestion wasn't to create a preprocessor substitute. :-)

You wouldn't get to dumb much down unless you're fine with running both
> your own preprocessor and then the real one for the preprocessor mode cache
> check.

Yes, that's of course what I had in mind.

> Even if you only wanted to look for #include statements you'd still need
> to evaluate all the #if directives.

Not sure about that. I maybe overlook something, but ccache would "only"
have to follow all #include statements and note all header files that don't
exist in the include path list. (When #include is used with a #defined
token for the filename, fall back to the real compiler.) When considering a
potential cache hit, reject it if any of the header files that didn't exist
then exist now.

 Anybody got other ideas?
>>
>
> Running the compiler with -v prints the header search directories. You
> could use that to do your own scan.

To use the directories from "cpp -v" (plus directories from the command
line) to do some optimistic validation was my first thought as well, but
after thinking more about it I came to the conclusion that it wouldn't buy
much safety because no subdirectories will be checked, and you can't tell
which subdirectories to check unless you have parsed the #include
statements. Also, it would trigger many false negatives.

BTW, gcc has an option "--trace-includes" that might be faster than
> scanning the preprocessor output, although the compiler still has to do all
> the same work. Like this: "gcc -E hello.c -o /dev/null".

How do you use --trace-includes? It doesn't seem to be documented and
nothing happens when I try it.

Please leave it on. The difference is like night and day, and the bug is
> rare and avoidable.

OK, we so far have one vote for and zero against. Any other? :-)

-- Joel