[ccache] Using git file hashes for ccache

Justin Lebar justin.lebar at gmail.com
Wed Dec 29 22:43:43 MST 2010


> It is my understanding that in the ccache hit case, a significant
> fraction of the running time is spent computing hashes of the original
> source files.

Yes, ccache spends most of its time hashing when it gets a direct mode
cache hit, at least according to my measurements.  I wrote a patch a
little while ago which uses a less-secure hash function which speeds
up ccache somewhat; you may want to try applying it and see if it
speeds up your builds.  (Interestingly, the ccache speed improvement
didn't translate into faster Firefox builds for me -- I haven't had a
chance to investigate why.)

> git is also frequently used for development, makes use of file hashes,
> and is extremely fast. When doing operations such as git diff, in the
> common case where the source file has not been modified, git will
> notice that the file's attributes (including mtime) matches these
> stored in the git index file, and thus it won't have to actually read
> the file to conclude that the contents have not changed.

Maybe the right thing to do would be to have ccache keep track of the
source files' attributes.  If some environment variable was set,
ccache would treat a file with unchanged attributes as unchanged.
(ccache could maintain a new index into its cache, indexed on absolute
path, or it could hash a string "magic-bitstring | file-path | file
attributes" and use the current cache infrastructure.)  This seems a
lot simpler than trying to interface with git.

I imagine this would be a safe optimization for most users to turn on
--  I don't think too many users modify files without changing their
mtimes, as this would mess up most build systems.  But it might be
especially useful if users could give a list of paths and say that
this optimization applied for all subdirectories of each of the given
paths.  That way you could turn on the optimization for your system's
header files, which AIUI get hashed over and over again in direct
mode, but almost never change.

-Justin

On Wed, Dec 29, 2010 at 11:54 PM, Michel Lespinasse <walken at google.com> wrote:
> Hi,
>
> It is my understanding that in the ccache hit case, a significant
> fraction of the running time is spent computing hashes of the original
> source files.
>
> git is also frequently used for development, makes use of file hashes,
> and is extremely fast. When doing operations such as git diff, in the
> common case where the source file has not been modified, git will
> notice that the file's attributes (including mtime) matches these
> stored in the git index file, and thus it won't have to actually read
> the file to conclude that the contents have not changed.
>
> I often use ccache to compile files out of git trees, and I was
> thinking that it could make use of the git index as well. The idea
> would be to use sha1 hashes instead of md4, and get these hashes out
> of the index (rather than computing them from the source file) when
> the file attributes match.
>
> I am wondering, has this been considered before ? what would project
> maintainers think of going that direction ?
>
> Thanks,
>
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.
> _______________________________________________
> ccache mailing list
> ccache at lists.samba.org
> https://lists.samba.org/mailman/listinfo/ccache
>


More information about the ccache mailing list