[ccache] Stumbling blocks with ccache and embedded/encapsulated environments

Paul Smith paul at mad-scientist.net
Wed Nov 10 15:54:32 MST 2010

Hi all; I've been considering for a long time enabling ccache support in
our build environment.  However, our environment has a few unusual
aspects and I wondered if anyone had any thoughts about steps we might
take to ensure things are working properly.  The documentation I've
found is just not _quite_ precise enough about exactly what is added to
the hash.

Very briefly we have multiple "workspaces" per user, mostly stored on
their own systems.  These workspaces are typically pointing to different
lines of development, and in those some files are the same and some are
different (pretty basic).  What I'd like to do is have one ccache per
user per host, so that all the workspaces for a given user on a given
host share the cache (rather than, for example, one cache per workspace
or sharing caches between users and/or hosts--that could come later).
Again, pretty straightforward.

The first interesting bit is that in our build environment we have a set
of (multiple different) cross-compilers, along with completely
encapsulated environments (usr/include, usr/lib, etc.) for different
targets.  These compilers and environments are packed up into tarballs
which are kept in our source tree, and unpacked by our build system when
our build starts.  We do not use the native compiler at all.

The second interesting bit is that the actual file that is invoked is
not the actual compiler, but a symlink to a shell script wrapper that
invokes the real compiler with a set of extra command-line arguments.
So we invoke a command like "i686-rhel4-linux-glibc-g++", which is a
symlink to a generic shell script like "toolchain-wrapper.sh", which
unpacks the symlink name ("i686-rhel4-linux-glibc-g++") to determine
that we want to run the g++ compiler to generate 32bit code compiled
against a Red Hat EL 4/GNU libc environment, then invokes a real
compiler with the right options to make that happen.  A different
command (say "x86_64-rhel5-linux-glibc-gcc") is a symlink to the same
"toolchain-wrapper.sh" file, but you get very different results.

The final interesting thing is that when we unpack these compiler
tarballs we use the -m option so that all the files we unpack have their
times set to "now", rather than the times they had when they were packed
up.  Thinking about this I believe we could remove this in this case, so
the timestamps would be preserved, if that would be useful.

So, a few things: first the default mtime/size to determine if compilers
have changed won't work well for us.  Every time I do a clean build and
my compilers are unpacked again, the timestamp on them will change (due
to tar -m), so I won't get any cache hits (right?)

If I remove the -m so that the timestamps in the tarball are preserved,
then the timestamps will always be identical, unless I load up a new
compiler build.  So that's actually nice.

What about the script wrapper?  Loading a new compiler will change the
timestamp (at least) on the script wrapper as well but here I worry
about incorrect duplication in the same build.  For example suppose I
build a file into two objects like this:

	i686-rhel4-linux-glibc-g++ -o 32bit/foo.o -c foo.c
	x86_64_rhel4-linux-glibc-g++ -o 64bit/foo.o -c foo.c

Now both of these are symlinks to the same wrapper script so ccache will
cache the same mtime/size for both compilers.  Also, they have the same
flags at this level.  Underneath, of course, the wrapper script will
invoke completely different compilers with different flags but that's
not visible to ccache.  Suppose the preprocessor output was the same in
both cases so that's not an issue: only the compiler generated 32bit .o
for the first and a 64bit .o for the second.

So, my question is, is the NAME of the compiler part of the hash as well
as the mtime/size, so that ccache won't consider the second one to be a
copy of the first?

Of course I can always resort to CCACHE_COMPILERCHECK='%compiler% -v'
which will tell me what I want to know (that these compilers are very
different).  But it would be nice to avoid that overhead if possible.

Also if I DO go with CCACHE_COMPILERCHECK, is ONLY that output part of
the hash?  Or is the mtime/size for the compiler also part of the hash?

It would be nice for debugging/etc. if there was a way to see exactly
what elements are going into the hash for a given target.

Sorry for the long email; thanks for any pointers or tips!

More information about the ccache mailing list