[ccache] Duplicate object files in the ccache - possible optimization?

Frank Klotz frank.klotz at alcatel-lucent.com
Fri Nov 4 18:12:21 MDT 2011


  I used ccache at my previous employer, and was very convinced of its 
value.  Now that I have started a new job, I am in the process of trying 
to bring the new shop on board with ccache, so I have been doing lots of 
test runs and looking at things.  Here is one thing I am thinking could 
add some value.

Looking through the ccache, I find many pairs of files which have 
different names (different hashes), but exactly identical content.  This 
actually makes sense, as each file would have an index hash and a 
preprocessed hash, and since ccache needs to be able to find a match on 
either, then both need to be in the cache.  (Actually, thinking about 
it, I'm a little surprised that there are any files in the ccache that 
DON'T appear twice - shouldn't EVERY compilation have 2 hashes?)

But it seems to me that it would make a lot of sense to store the data 
of these 2 files only once, by hard-linking the 2 names to the same 
inode.  (For filesystems that support hard links, of course!)  Every 
time ccache does an actual compilation and stores a file in the cache, 
it should store it under hard links for BOTH hashes - the indexed hash 
and the proprocessed hash.  And if it gets a hash miss on the indexed 
hash but a hit on the preprocessed hash, then it should add the missed 
index hash as a hard link to the file found.  So a given file (inode) in 
the cache could actually be referenced by MANY directory entries: one 
preprocessed hash, and multiple index hashes for various different 
combinations of source files and header files which end up producing the 
same output when passed through the preprocessor.

This could increase the storage efficiency of the ccache.

Of course, since not every filesystem supports hard links, the simplest 
solution was of course just to have multiple file copies.  So I guess 
adding code to do this would require some way to determine if the 
filesystem the cache is on can in fact support hardlinks.

If you think this sounds like a good idea, but don't have bandwidth to 
do it, I would be willing to give it a try.  Any hints on where to start 
would of course be welcome.

Thanks,
Frank Klotz


More information about the ccache mailing list