[Samba] smbd hung processes - Samba 3.0.7
john.nelson at teradyne.com
john.nelson at teradyne.com
Fri Dec 3 21:05:16 GMT 2004
We've seen Samba crash and burn twice in the last 48 hours - it just
started happening, and we have no idea what might be causing it. I'm
hoping that someone will recognize this problem.
Platform: we are running RedHat Enterprise Server, with Samba 3.0.7.
We're using security=domain in an old-style NT4 domain environment.
The symptom that we're seeing is that the number of smbd processes
suddenly begins to increase. We normally run with betwen 100 and 150 smb
processes, but when Samba fails, the number starts to increase quickly,
and users start to have problems accessing files.
smbstatus reports approximately the right number of clients (133), but ps
shows a much larger number of smbd processes active (680). Smbstatus
reports a list of active smbd processes - this list includes the oldest
processes and the newest processes, but there is a block of smbd processes
in the middle that are not in the smbstatus report. What we THINK is
happening is that the smbd processes begin to hang, the clients time out,
they initiate a new session with Samba server, which respawns another smbd
server process (leaving the old, hung process running). This keeps
happening over and over until we kill samba. The hung processes need to
be kill -9'ed.
If you do a "strace" on these apparently hung processes, you see this:
# strace -p 20403
Process 20403 attached - interrupt to quit
fcntl64(13, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=280,
len=1}
<unfinished ...>
I'm not sure if it's relevent, but netstat -a reports a large number of
sockets in the CLOSE_WAIT state (I've included a small sample):
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 1 0 valhalla:microsoft-ds army39:1455 CLOSE_WAIT
tcp 1 0 valhalla:microsoft-ds 131.101.40.174:2531
CLOSE_WAIT
tcp 54 0 valhalla:microsoft-ds army39:1435 CLOSE_WAIT
tcp 54 0 valhalla:microsoft-ds 131.101.40.174:2512
CLOSE_WAIT
In this log, valhalla is the Samba server, and microsoft-ds is port 445
(the CIFS port).
There doesn't seem to be anything relevent in the smbd log files (we were
using log level 1). We've increased the log level to 3 in the hope that
we'll get more information the next time Samba goes wild.
Our smb.conf file isn't complicated - the global section looks like this:
[global]
workgroup = ICD
netbios name = VALHALLA
security = domain
password server = *
wins server = nn.nn.nn.nn mm.mm.mm.mm
server string = Linux ClearCase Server %v %h
log file = /var/log/samba/%m.log
log level = 3
max log size = 4000
username map = /etc/samba/smbusers
read raw = no
oplocks = no
kernel oplocks = no
level2 oplocks = no
create mask = 0774
directory mask = 0775
map archive = No
preserve case = yes
deadtime = 0
More information about the samba
mailing list