[ccache] gmake and ccache conspiring together in creating gremlins
sam.varshavchik at gmail.com
Mon Feb 8 22:51:02 UTC 2021
On Mon, Feb 8, 2021 at 2:38 PM Paul Smith <psmith at gnu.org> wrote:
> On Mon, 2021-02-08 at 10:43 +0000, Edward Welbourne wrote:
> > Sounds to me like that's a bug: when the descriptors are closed, the
> > part of MAKEFLAGS that claims they're make's jobserver file
> > descriptors should be removed, since that's when the claim stops
> > being true.
> I believe there have been other similar issues reported recently.
> Certainly fixing MAKEFLAGS when we run without jobserver available is
> something that could be done.
> There is a loss of debugging information if we make this change: today
> make can detect if it was invoked in a way that _should_ expect to
> receive a jobserver context, but _didn't_ receive that context. That
> is, if make sees that jobserver-auth is set but it can't open the
> jobserver pipes it can warn the user that most likely there's a problem
> in their environment or with the setup of their makefiles.
> Without this warning there's no way to know when this situation occurs.
> It's easy to create a situation where every sub-make will create its
> own completely unique jobserver domain. So you start the top make with
> -j4 and run 4 sub-makes; if you do it wrong then each of 4 sub-makes
> could create a new jobserver domain, and now you're running 16 jobs in
> parallel instead of 4... there's no way for make to warn you about this
One thought occurred to me. Specifically: when make executes what it
believes to be something other than a recursive invocation of $(MAKE),
and it closes the job server pipe file descriptors for that, it can
1) Add an additional parameter to MAKEFLAGS, let's call it
"--no-jobserver", and perhaps remove the --jobserver-auth parameter
completely. It might be easier just to append something there, instead
of surgically removing this.
2) Make checks for a --no-jobserver in MAKEFLAGS when it starts. If
it's there it does NOT attempt to validate the file descriptors that
are given in --jobserver-auth (if this parameter is preserved). It's a
given that they're not there:
if (!FD_OK (job_fds) || !FD_OK (job_fds) || make_job_rfd () < 0)
Don't even do that. What happens right now a warning message gets
printed and make runs without a job server. This change should have
the same result, print the warning but skip the FD_OK tests.
This will result in the same warning, but it should avoid triggering
the bug that I found.
However that might cause a minor regression in LTO linking. I think
that this prevents the LTO linker's internal invocation of make from
finding that it can attach to the original make process's job server.
>From sifting through strace dumps, I see that a linker-invoked make
gets its own -j flag. It appears that the linker is courteous enough
to count how many CPUs it has and use it to construct its own -j flag.
How about this, safe approach: once --no-jobserver is there it stays
there, and gets propagated to all recursively invoked makes. If an
invoke make finds that it has both a --no-jobserver and a -j flag,
it'll warn and refuse to create its own job server, and then proceed
executing one command at a time.
This prevents an arithmetic proliferation of job worker processes if
the original job server's file descriptors get lost. Currently
recursively-invoked makes will find, and attach themselves to, an
existing job server. This is nice; but this is vulnerable to an edge
case that I think I'm hitting: a false positive involving a leaked
file descriptor. This change encourages fixing whatever's causing make
to fail to detect a recursive invocation.
More information about the ccache