[ccache] ccache direct mode

Joel Rosdahl joel at rosdahl.net
Wed Jan 6 15:11:28 MST 2010


Hi,

Some time ago, I observed that running the preprocessor on the input
file and then hashing the output was quite a bit slower than just
hashing the input file and all included files. I then got an idea:
Would it be possible to make ccache hash the source code directly
without running the preprocessor to speed things up? After some
thinking, experimenting and coding, I have concluded that the answer is
yes.

The basic idea of how to achieve this is finding out which files were
included by the preprocessor and then storing the hash sums of those
files in a file associated with the input file and compiler arguments.
When compiling the same input file (with the same arguments), the list
of include files and their hash sums can be read and verified in order
to look up the correct object file. In my implementation, this ccache
mode is called the "direct mode" and the standard ccache way is called 
the "preprocessor mode".

I've chosen the name "manifest" for the data structure containing the
include file list, their hash sums and the asssociated object file
names. The manifest file is stored in the ccache directory under the
name X.manifest, where X is the hash sum of the input file and the
compiler arguments. The manifest doesn't include the object file data,
just the name (which happens to be the hash sum of the preprocessor
output associated with the object file).

So, when ccache is asked to compile a file, the manifest is read, and
for each object file name in the manifest, the associated include
files' hash sums are verified. If there is a match, the compilation
result is known. If no object file matches, ccache falls back to the
preprocessor mode. After preprocessing and compiling, the manifest is
updated with the read include files and their hash sums.

By not running the preprocessor, CPU usage is reduced; the runtime is
about 0.2-1.0 times that of ccache in preprocessor mode. The relative
speedup is higher when I/O is fast (e.g., when files are in the disk
cache). Here are some unscientific measurements of compiling Samba on
my Linux system (with a filled disk cache):

Without ccache............: 321 s
With original ccache......: 100 s
With ccache in direct mode:  28 s

I've never seen the direct mode make ccache slower, although it should
be possible in pathological cases.

The implementation is based on the latest CVS revision of ccache plus
most of the patches accumulated in the Debian ccache package. While
experimenting and implementing, I have done some other cleanup and
improvements as well. See
http://github.com/jrosdahl/ccache/raw/master/NEWS for a high-level list
of changes (including those committed to CVS but not yet released).

If you're interested, try it out! And please report any bugs, design
flaws or other problems you find. I'm not aware of any bugs, but I'd be
surprised if there aren't any left. In particular, I have probably made
ccache less portable since I've only built and tried it on relatively
modern GNU/Linux systems and GCCs.

Source code snapshot:

  http://cloud.github.com/downloads/jrosdahl/ccache/ccache-2.4_direct.1.tar.gz

Git repository:

  http://github.com/jrosdahl/ccache

Comments and improvements are welcome.

Regards,
Joel


More information about the ccache mailing list