[ccache] ccache interrupt handling bug

Nadav Har'El nyh at cloudius-systems.com
Thu Jul 23 12:29:05 UTC 2015


Hi, I found a bug in ccache, which makes it impossible to correctly
interrupt a compilation with a control-C (I tried this on Linux).

Consider the following C++ program from hell that takes 13 seconds to
compile on my machine (change "27" to a higher number to make it even
slower):

template <int TreePos, int N> struct FibSlow_t {
    enum { value = FibSlow_t<TreePos, N - 1>::value +
            FibSlow_t<TreePos + (1 << N), N - 2>::value, };
};
template <int TreePos> struct FibSlow_t<TreePos, 2> { enum { value = 1 }; };
template <int TreePos> struct FibSlow_t<TreePos, 1> { enum { value = 1 }; };
static int s_value = FibSlow_t<0, 27>::value;


Compile this with: "CCACHE_RECACHE=1 ccache g++ -c example.cc"

Now try to interrupt this with control-C and note something really strange:
The first control-C is seemingly ignored and compilation continues! The
second control-C does work and stops compilation. When this is run from
some build system (e.g., "ninja" or "make"), control-C doesn't work at all.

The reason why the second interrupt works is simple: you used signal(),
whose archaic behavior is to set the signal handler for only one go. So
after the first ^C, the second one gets the default signal handling, i.e.,
exit, which works :-)

But why doesn't the first ^C, which calls signal_handler(), work, and cause
the compilation to continue?

It turns out that signal_handler() doesn't exit after cleaning up. Rather,
the log shows:

[2015-07-23T15:04:02.088253 21059] Executing /usr/bin/g++ -c -o z.o
/home/nyh/.ccache/tmp/z.stdout.rice.21059.nHZLHE.ii

<here I pressed control-C>

[2015-07-23T15:04:05.847059 21059] Unlink
/home/nyh/.ccache/1/a/873cb37b579cd5dd45ca9c43ae8030-644.o.tmp.stdout.rice.21059.9FTbHl
[2015-07-23T15:04:05.847138 21059] Failed opening
/home/nyh/.ccache/tmp/tmp.cpp_stderr.rice.21059.AQUGJX: No such file or
directory
[2015-07-23T15:04:05.847147 21059] Failed; falling back to running the real
compiler
[2015-07-23T15:04:05.847151 21059] Executing /usr/bin/g++ -c z.cc

So, after the signal, we delete the temporary file but continue (in
waitpid() in execute.c). Very soon afterwards, waitpid() discovers the
child also died (it also got the SIGINT signal, like all the processes
connected to the terminal's process group). The execute() call returns -1.

But the code ignores that, and considers this a general "failure" to run
the compiler, which then causes it to run the compiler again! So it's not
that the compilation isn't interrupted - it actually is, and then restarted!

One way to fix this bug is to recognize that the fact execute() returned -1
has a special meaning (a signal), and ccache should exit and not try to run
the compiler again. The following patch fixes the bug. I'm not sure it's
the "best" fix, but it works:

@@ -839,10 +840,15 @@
         args_add(args, i_tmpfile);
     }

     cc_log("Running real compiler");
     status = execute(args->argv, tmp_stdout_fd, tmp_stderr_fd);
+    if (status == -1) {
+        /* The compiler was interrupted by a signal */
+        exit(1);
+    }
+
     args_pop(args, 3);

     if (x_stat(tmp_stdout, &st) != 0) {
         /* The stdout file was removed - cleanup in progress? Better bail
out. */
         stats_update(STATS_MISSING);




-- 
Nadav Har'El
nyh at cloudius-systems.com


More information about the ccache mailing list