[ccache] Duplicate object files in the ccache - possible optimization?
frank.klotz at alcatel-lucent.com
Fri Nov 4 18:12:21 MDT 2011
I used ccache at my previous employer, and was very convinced of its
value. Now that I have started a new job, I am in the process of trying
to bring the new shop on board with ccache, so I have been doing lots of
test runs and looking at things. Here is one thing I am thinking could
add some value.
Looking through the ccache, I find many pairs of files which have
different names (different hashes), but exactly identical content. This
actually makes sense, as each file would have an index hash and a
preprocessed hash, and since ccache needs to be able to find a match on
either, then both need to be in the cache. (Actually, thinking about
it, I'm a little surprised that there are any files in the ccache that
DON'T appear twice - shouldn't EVERY compilation have 2 hashes?)
But it seems to me that it would make a lot of sense to store the data
of these 2 files only once, by hard-linking the 2 names to the same
inode. (For filesystems that support hard links, of course!) Every
time ccache does an actual compilation and stores a file in the cache,
it should store it under hard links for BOTH hashes - the indexed hash
and the proprocessed hash. And if it gets a hash miss on the indexed
hash but a hit on the preprocessed hash, then it should add the missed
index hash as a hard link to the file found. So a given file (inode) in
the cache could actually be referenced by MANY directory entries: one
preprocessed hash, and multiple index hashes for various different
combinations of source files and header files which end up producing the
same output when passed through the preprocessor.
This could increase the storage efficiency of the ccache.
Of course, since not every filesystem supports hard links, the simplest
solution was of course just to have multiple file copies. So I guess
adding code to do this would require some way to determine if the
filesystem the cache is on can in fact support hardlinks.
If you think this sounds like a good idea, but don't have bandwidth to
do it, I would be willing to give it a try. Any hints on where to start
would of course be welcome.
More information about the ccache