[Samba] smbd hung processes - Samba 3.0.7

john.nelson at teradyne.com john.nelson at teradyne.com
Fri Dec 3 21:05:16 GMT 2004


We've seen Samba crash and burn twice in the last 48 hours - it just 
started happening, and we have no idea what might be causing it.  I'm 
hoping that someone will recognize this problem.

Platform:  we are running RedHat Enterprise Server, with Samba 3.0.7. 
We're using security=domain in an old-style NT4 domain environment.

The symptom that we're seeing is that the number of smbd processes 
suddenly begins to increase.  We normally run with betwen 100 and 150 smb 
processes, but when Samba fails, the number starts to increase quickly, 
and users start to have problems accessing files.

smbstatus reports approximately the right number of clients (133), but ps 
shows a much larger number of smbd processes active (680).  Smbstatus 
reports a list of active smbd processes - this list includes the oldest 
processes and the newest processes, but there is a block of smbd processes 
in the middle that are not in the smbstatus report.  What we THINK is 
happening is that the smbd processes begin to hang, the clients time out, 
they initiate a new session with Samba server, which respawns another smbd 
server process (leaving the old, hung process running).  This keeps 
happening over and over until we kill samba.  The hung processes need to 
be kill -9'ed.

If you do a "strace" on these apparently hung processes, you see this:

    # strace -p 20403
    Process 20403 attached - interrupt to quit
    fcntl64(13, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=280, 
len=1}
     <unfinished ...>

I'm not sure if it's relevent, but netstat -a reports a large number of 
sockets in the CLOSE_WAIT state (I've included a small sample):

    Proto Recv-Q Send-Q Local Address           Foreign Address State
    tcp        1      0 valhalla:microsoft-ds   army39:1455 CLOSE_WAIT
    tcp        1      0 valhalla:microsoft-ds   131.101.40.174:2531 
CLOSE_WAIT
    tcp       54      0 valhalla:microsoft-ds   army39:1435 CLOSE_WAIT
    tcp       54      0 valhalla:microsoft-ds   131.101.40.174:2512 
CLOSE_WAIT

In this log, valhalla is the Samba server, and microsoft-ds is port 445 
(the CIFS port).

There doesn't seem to be anything relevent in the smbd log files (we were 
using log level 1).  We've increased the log level to 3 in the hope that 
we'll get more information the next time Samba goes wild.

Our smb.conf file isn't complicated - the global section looks like this:

[global]
   workgroup = ICD
   netbios name = VALHALLA
   security = domain
   password server = *
   wins server = nn.nn.nn.nn mm.mm.mm.mm
   server string = Linux ClearCase Server %v %h
   log file = /var/log/samba/%m.log
   log level = 3
   max log size = 4000
   username map = /etc/samba/smbusers
   read raw = no
   oplocks = no
   kernel oplocks = no
   level2 oplocks = no
   create mask = 0774
   directory mask = 0775
   map archive = No
   preserve case = yes
   deadtime = 0



More information about the samba mailing list