[ccache] direct mode design bug

Joel Rosdahl joel at rosdahl.net
Sun Nov 4 12:10:44 MST 2012


The direct mode, which was introduced in version 3.0 almost three years
ago, has a design bug. The essence of the problem is that in the direct
mode, ccache records header files that were used by the compiler, but it
doesn't record header files that were not used but could have been used if
they existed. So, when ccache checks if a result could be taken from
the cache, it can't check if the existence of a new header file should
invalidate the result.

This scenario is probably quite rare since only few people have reported it
during the years (there are two public bug reports:
https://bugzilla.samba.org/show_bug.cgi?id=8424 and
https://bugzilla.samba.org/show_bug.cgi?id=8728), but the problem may of
course happen without the user reporting it or knowing about it. Anyway,
regardless of frequency, it makes ccache's behavior differ from that of the
unwrapped compiler.

Unfortunately, I don't know how to fix the issue in a good way.

One obvious way would be to try to figure out in which directories the
preprocessor has looked for header files, store that information and do the
same search when considering a cache result. But how to do that?

1. ccache could use strace or similar ways of monitoring the compiler and
tracing the performed system calls to find out where headers were probed. I
haven't measured, but I suspect that this would be slow.

2. ccache could override strategic functions using LD_PRELOAD, thus
snooping on system calls without involving the kernel. This should be
possible and quite fast, but it's tricky to get right, and it's not very
portable. (By the way: This is what
http://audited-objects.sourceforge.netdoes, although I don't know if
it monitors and acts on probes of
nonexistent files.)

3. ccache could try to imitate what the preprocessor does. That is, read
the source code file and follow #include statements instead of looking at
the preprocessor output. This essentially means implementing a dumbed down
version of a preprocessor, a task that doesn't sound trivial: It has to be
significantly faster than the real preprocessor to make a difference, it
will be more coupled to the behavior of the compiler and its various
options (-I, -idirafter, -isystem, etc), and it probably has to know the
compiler's default include directories.

Anybody got other ideas?

Regarding option 3: If I understand correctly, distcc's pump mode does
something similar, so perhaps there is code to borrow or be inspired by?

Since a quick fix likely isn't possible in the short term, and I would like
to release ccache 3.2 soon, we have to decide whether the direct mode
should default to off or on. Please share any opinions!

-- Joel

More information about the ccache mailing list