[clug] Out of Memory: Kill process 2689 (mysqld) score 33827 and children.

Tue Aug 17 01:59:45 MDT 2010

I've seen something similar *a lot*...

On 17 August 2010 12:17, Daniel Pittman <daniel at rimspace.net> wrote:
>
> Carlo Hamalainen <carlo.hamalainen at gmail.com> writes:
>
> > I have a 512Mb Linode server running Ubuntu 8.10 which has been slowly
> > apt-get upgraded towards 9.04. The server runs a few Wordpress blogs
> > and a satchmo/django shop. Every week or so cpu usage goes to 100%,
> > everything becomes unresponsive, and the console shows this:
> >
> > Out of Memory: Kill process 2961 (apache2) score 44591 andchildren.
> > Out of memory: Killed process 3019 (apache2).
> > Out of Memory: Kill process 2689 (mysqld) score 33827 and children.
> > Out of memory: Killed process 2689 (mysqld).
> > Out of Memory: Kill process 2699 (mysqld) score 33827 and children.
> > Out of memory: Killed process 2699 (mysqld).
> > Out of Memory: Kill process 2703 (mysqld) score 33827 and children.
> > Out of memory: Killed process 2703 (mysqld).
> > Out of Memory: Kill process 2961 (apache2) score 21690 and children.
> > Out of memory: Killed process 3022 (apache2).
> > Out of Memory: Kill process 15914 (mysqld) score 34048 and children.
> > Out of memory: Killed process 15914 (mysqld).
> >
> > I have to hard reset the server as ssh is completely unresponsive.
> >
> > So, is this really a memory leak in mysqld?

I doubt it. The failure case I normally see is that *something* is
causing deadlock on a DB resource and Apache doesn't have a low enough
MaxClients configured. So what happens is each request that comes in
opens up a new thread that just sits there waiting for a lock it will
never get and takes up a new process slot with a new DB connection and
a bit more memory.

>
> Maybe.  It could also be that it happens to be one of the biggest processes,
> so gets picked on first by the OOM killer.
>
> > Other people [1] had a similar issue but the fix suggested was to change
> > apache2's maxclients from 150 to 20. Does that sound ok?
>
> It does sound like your memory tuning is completely, utterly wrong, yeah.

If you're running out of memory then yes it's perfectly reasonable to
lower that value, at least it will stop you having to hard reset the
server while you search for the real solution. There's no point
letting Apache try to serve 120 clients when you consistently run out
of memory at 50... or put another way it's better to let Apache fail
before your kernel does.

>
> What you want to do is make sure that your system isn't using more memory than
> it actually has.  So, work out how much memory the Apache, PHP, MySQL, and
> other bits take up for each client ... then tune so you don't go beyond that.
>
> http://www.selenic.com/smem/ is very useful to identify the actual non-shared
> memory for each process, which is what the real cost comes from.
>
> Also, http://mysqltuner.pl/ will give advice on tuning memory and other MySQL
> variables based on performance and the system.

Tuning is always a good thing, but if it's a deadlock issue it will
not solve your current issue.

>
> > I'd rather collect some debug info and make an evidence-based decision to
> > change the maxclients option.
>
> Good plan.  Otherwise you will have the same problem because you are still
> tuned well above the real memory limit.

I'd suggest using Apache's built in server status:
http://httpd.apache.org/docs/current/mod/mod_status.html

Turn on ExtendedStatus *on a url only you can see* (because it can
give away info you probably don't want to give away). Look for
processes in "W" states with a "SS" (or running time) that's
unreasonably high. "Unreasonably high" will depend on your app, but I
normally start with 30 seconds. Once you're finding procs that are
failing, see if there's a pattern to the URL they are on. Strace the
procs to see what they are doing, are they waiting on a DB FD to
respond. Does top actually report mysqld or httpd using cpu time or is
it kswapd? And keep on with those sort of checks...

> Other things to consider include running PHP as a FastCGI process, and using a
> lighter-weight front-end server than Apache.  Collect evidence about where the
> problem lies first, though, because memory tuning is hard.

Once again, this could be very useful, if you're not having a deadlock
problem. If you're having a deadlock issue, your system will most
likely crash *at some point* no matter how much tuning you do to make
everything else run faster or cache better elsewhere.

For reference some of the issues I've resolved like this were related
to combinations of flakey components. I've seen bugs in both
eAccelerator and APC terminate threads in bad ways so that resources
on the FS could never be released, causing all new threads to queue
up. I've seen complex php pages terminate early due to a low a php
memory_limit that would leave their DB connection dangling and cause
new threads to deadlock (that one only happened with eAccelerator
enabled though). I've seen maintenance scripts that lock FS objects
completely separately to either httpd or postgres (I don't use mysql)
and cause all new threads to queue up on those locks. All of these
were resolved with server status, strace, lsof and DB debugging
options in the logs.

Cheers,
Dave