[ccache] OT: Sunday morning idea

Sun Nov 4 01:29:20 MST 2012

Hi everyone and wish you a nice day! This is no specific ccache problem,
nor is it a request for a new option. It's just a thought after waking
up... You can just trash this email, if you don't want to be bothered :)

I rethought my problems with awfully slow compilers that ccache doesn't
handle - on that 'other OS' - and how to solve them. The Python script
written by froglogic GmbH (raabe at froglogic dot com) and inspired by
ccache is a nice try, while it isn't capable enough for my needs.

Then I thought what would be possible, if I not just wrapped stdin,
stdout and stderr, but instead intercepted these initial handles and in
addition all following calls to the operating system's file i/o while
running the (compiler) child process. It then wouldn't have to be gcc,
but could be anything that is called during a make process step, i.e. a
process created by make or some IDE's build stage: preprocessor,
compiler, assembler, linker, resource compiler...

Would it be possible to have a supervising process track all the input
data, and program code loaded through file i/o, as well as the resulting
output data and cache that entire process run using a hash built from
the input?

On a second run of that same scenario, i.e. after some new process was
created, did read any amount of input - stdin and one or more files
until EOF - the hash built so far from that data would be looked up in a
cache.

If found, the intercepted output data handles would be written to from
the cached output by the supervisor process and the child process itself
would be killed. The supervisor would then return the result code taken
from the earlier real run of that same scenario.

This would of course have a very system dependent part of hooking into
syscalls, which I believe can be managed to the required degree. I do
not yet see my fault in the basic idea.

Would the order of output files written matter? Probably not, unless
they are later opened as inputs by that same process or its child
processes. In that case the intermediate output data would automatically
become part of the input hash as well...

To sum it up I thought about an abstract input-output cache that could
ideally speed up lengthy calculations of any kind, as long as the
assumption is right that these calculations solely depend on input data
(and code) read through the operating system's file i/o calls.

Am I mad?

Jürgen