[Samba] Samba Hung Process ("D state") and hung system
servergremlyn at gmail.com
Fri Nov 2 15:50:26 GMT 2007
I had an interesting problem with a Samba server recently, and I think I
know what happened. But I want to post here to see if those wiser than I
can confirm this or give me a better explanation of what went awry.
I initially noticed around 8:35 AM samba wasn't working at all, and that my
server had huge load averages indicated by "uptime". Though in "top" I
could clearly see that no burden was on the CPU. A "ps -e u" showed me a
large number of process in the dread " D " state. I grepped the output of
the ps command thusly, "ps -e u | grep " D " " to get a list of all the
processes in the D state. I noticed that it seemed nearly all of them were
just smbd. So I grepped out the smbd lines with the command "ps -e u | grep
" D " | grep -v smbd". This gave me just 1 process. That full line reads
root 16909 0.0 0.0 2428 508 ? D 04:02 0:00 quotaoff
I tried to shutdown the system, but even THAT failed. Looking in ps
revealed that the shutdown command was also in a " D " state. So I held the
power button on the machine until it died and then brought it back up.
Everything seemed fine at this point.
So what happened?
There is a cronjob on the server scheduled for 4:02AM every night that does
nothing more than run quotaoff, quotacheck, and then quotaon. This time,
for some reason, quotaoff failed miserably. It went into the "D" state
permanently, and just locked up the whole hard disk, keeping anything else
from using it. Of course, no one is trying to use the file server for any
reason at 4AM.... so no problems are apparent yet. But then at about 8:35
AM, when people start using the server, smbd processes start showing up.
They all try to access the hard disk, because that's where the files are.
But they can't because of the hung quotaoff process. So they just all start
hanging (going into the D state), waiting permanently on the hard disk.
When I run shutdown, that tries to unmount the filesystem, because that is
part of the procedure. So it also enters the D state forever, because it's
also waiting on the hard disk which quotaoff has somehow locked up.
Thus, the only fix is a nasty reboot by using the power button on the box.
Because a process in the D state ignores all signals, including SIGKILL (9).
So how much of that did I get right...? Any remarks from helpful gurus?
More information about the samba