[ccache] basename of source file in hashed name in ccache

Tue Nov 8 15:18:33 MST 2011

  On 11/08/2011 01:49 PM, Joel Rosdahl wrote:
> On 5 November 2011 01:26, Frank Klotz<frank.klotz at alcatel-lucent.com>  wrote:
>> [...] I remember quite clearly (and I just confirmed with a colleague who is
>> still there) that the file names in the cache contained BOTH the hash AND the
>> basename of the object file.
> As far as I know, the object files have always been stored using only the hash.
> However, temporary files (stored in $CCACHE_DIR in ccache<=2.4 and
> $CCACHE_DIR/tmp in ccache>=3.0) include (a prefix of) the basename.
>
>> [...] (and another string that the ccache code refers to as "size", although
>> I can't quite figure out what it's the size OF)
> It's the size of the hashed text, i.e. output from the preprocessor. This is
> just a way of making filename collisions somewhat less likely.
>
>> One place we found the basenames invaluable was tracking down a corrupted
>> object file in the cache. Once we confirmed that we had a corrupt object file
>> foo.o, we simple searched for all "*foo.o" files in the cache, compared those
>> in size and content to an actual corrupted object file in the user directory,
>> and easily removed the corrupted file from the cache. Much harder (not
>> impossible, but harder) to do this without the basenames.
> An easy way to do that is:
>
> 1. Remove foo.o from the build tree.
> 2. Build with CCACHE_LOGFILE set to a log file.
> 3. Look for "Created foo.o from X" (where X is a file in the cache) in the log
>     file.
> 4. Remove X.
>
> Or even easier:
>
> 1. Remove foo.o from the build tree.
> 2. Build with CCACHE_RECACHE set.
>
>> [...] Anyway, is there a general consensus on whether this would be valuable?
> It doesn't sound like a good idea to me, at least, since you would need to
> store duplicate copies of the object file for two compilations where the source
> content is the same but the file names differ.

1:  Would that EVER happen?  (I am having trouble visualizing a 
situation where this would be a good thing.)
2:  If it DID happen, rather than 2 copies, could store one inode with 2 
directory entries (hard links) with the 2 names.

So while I understand your objection, I don't think it is a total 
deal-killer.
Of course, I also understand that the case FOR the change I proposed 
isn't necessarily all that compelling either!  It was more that I was 
accustomed to seeing it that way, and thought it useful.  But I do see 
from your other responses that there are other easy ways to accomplish 
the specific rationale I gave for it.
It was also somewhat interesting to be able to find all the cached 
copies of foo.o in the cache -
   find $CCACH_DIR -name '*foo*.o" -ls
And once you have basenames in the ccache, you can find other things to 
do with them too....

Not a major issue for me, but wanted to get the suggestion out there.

Thanks,
Frank

> -- Joel