[clug] Out of Memory: Kill process 2689 (mysqld) score 33827 and children.

Wed Aug 18 02:26:05 MDT 2010

On 18 August 2010 12:23, Daniel Pittman <daniel at rimspace.net> wrote:
>> Tuning is always a good thing, but if it's a deadlock issue it will
>> not solve your current issue.
>
> Yeah, I just have trouble believing that there is anything particularly likely
> to involve a deadlock in this scenario.
>
>
> Actually, I am surprised you see so many deadlocks in MySQL, frankly.  Pretty
> much none of the PHP applications I am familiar with do any locking at all,
> just random updates without transactions.  Which packages usually trigger this
> sort of lock problem?
>

I'm much more familiar with one specific application - MySource
Matrix. It runs on PostgreSQL and Oracle, not MySQL. Specifc problems
vary drastically but one great way to deadlock PHP is to make use of
the native session handler and find any old URL that crashes that
thread, then just keep sending in requests until Apache runs out of
slots (or in this case gets killed off by OOM).

I've seen others that involved bad accelerators mentioned below.

>
>> For reference some of the issues I've resolved like this were related to
>> combinations of flakey components. I've seen bugs in both eAccelerator and
>> APC terminate threads in bad ways so that resources on the FS could never be
>> released, causing all new threads to queue up.
>
> Interesting.  Were those resources flocks on their cache files, or something
> else?  Which versions were they found / resolved in?  (Bonus points for links
> to your upstream bug reports, since they will have all those details without
> you having to repeat them. ;)

Unfortunately most of these issues have only been reproducable in
production environments (turn off eAccelerator, server starts working,
turn it on and *sometime* later it stops again) and I've never managed
to get enough data to isolate something specific enough to lodge a bug
about. The solution to date has been "disable eAccelerator". Most of
this was with CentOS 5.3 and eAccelerator custom compiled by clients
with share root to servers. A few of our issues were resolved by
upgrading APC to later versions, so as far as I'm aware we haven't
lodged upstream bugs in most of these cases.

We've also built a version of APC that we've been using in a lot of
production systems since Feb and it *seems* to be stable:
http://packages.squiz.net/redhat/5Server/x86_64/php-pecl-apc-3.0.19-1-squiz.x86_64.rpm
note: use at your own risk :)

>> I've seen complex php pages terminate early due to a low a php memory_limit
>> that would leave their DB connection dangling and cause new threads to
>> deadlock (that one only happened with eAccelerator enabled though). I've
>> seen maintenance scripts that lock FS objects completely separately to
>> either httpd or postgres (I don't use mysql) and cause all new threads to
>> queue up on those locks. All of these were resolved with server status,
>> strace, lsof and DB debugging options in the logs.
>
> *nod*  Very strange.  It sounds like a whole lot of the problems come from the
> range of crappy PHP "accelerators" that are intended to work around the nasty,
> non-persistent nature of PHP.  Is that a fair summary?

A lot of issues are related to that. We've had a lot of interaction
problems that "no one else sees" with all sorts of PHP accelerators. I
also think the fact that PHP session access locks the session until
session_write_close() is called probably means that a lot of PHP
applicatiosn are fairly vulnerable to the "something failed and I
pressed reload 100 times and now the whole server is dead" form of
accidental (or even purposeful) low volume DoS attack.

It doesn't actually have to be a deadlock though, if you've got a page
that takes 5 seconds to load (not _that_ high for PHP) and you don't
release session_write_close() until the end of a page load (also not
unusual for PHP) you can max out a server with a MaxClients of 100 in
about 120s by making one request every second (not high enough to get
noticed in most log files I look at). PHP's max_execution_time doesn't
exit the thread when PHP is waiting on IO, so there's no cleanup
mechanism in a default install. If you've got MaxClients set to a
value where your server runs out of memory and something like this is
happening OOM steps in and kills off processes for you, usually well
after you've been thrashing your swap partition for a while.

The point is still to strace/lsof the procs and see what's going on,
to do that the kernel needs to remain responsive while Apache crashes,
which usually means lowering MaxClients.

Cheers,
Dave