[Samba] smbd hung processes - Samba 3.0.7

Christian Merrill cmerrill at redhat.com
Fri Dec 3 21:35:11 GMT 2004

john.nelson at teradyne.com wrote:

>We've seen Samba crash and burn twice in the last 48 hours - it just 
>started happening, and we have no idea what might be causing it.  I'm 
>hoping that someone will recognize this problem.
>Platform:  we are running RedHat Enterprise Server, with Samba 3.0.7. 
>We're using security=domain in an old-style NT4 domain environment.
>The symptom that we're seeing is that the number of smbd processes 
>suddenly begins to increase.  We normally run with betwen 100 and 150 smb 
>processes, but when Samba fails, the number starts to increase quickly, 
>and users start to have problems accessing files.
>smbstatus reports approximately the right number of clients (133), but ps 
>shows a much larger number of smbd processes active (680).  Smbstatus 
>reports a list of active smbd processes - this list includes the oldest 
>processes and the newest processes, but there is a block of smbd processes 
>in the middle that are not in the smbstatus report.  What we THINK is 
>happening is that the smbd processes begin to hang, the clients time out, 
>they initiate a new session with Samba server, which respawns another smbd 
>server process (leaving the old, hung process running).  This keeps 
>happening over and over until we kill samba.  The hung processes need to 
>be kill -9'ed.
>If you do a "strace" on these apparently hung processes, you see this:
>    # strace -p 20403
>    Process 20403 attached - interrupt to quit
>    fcntl64(13, F_SETLKW64, {type=F_RDLCK, whence=SEEK_SET, start=280, 
>     <unfinished ...>
>I'm not sure if it's relevent, but netstat -a reports a large number of 
>sockets in the CLOSE_WAIT state (I've included a small sample):
>    Proto Recv-Q Send-Q Local Address           Foreign Address State
>    tcp        1      0 valhalla:microsoft-ds   army39:1455 CLOSE_WAIT
>    tcp        1      0 valhalla:microsoft-ds 
>    tcp       54      0 valhalla:microsoft-ds   army39:1435 CLOSE_WAIT
>    tcp       54      0 valhalla:microsoft-ds 
>In this log, valhalla is the Samba server, and microsoft-ds is port 445 
>(the CIFS port).
>There doesn't seem to be anything relevent in the smbd log files (we were 
>using log level 1).  We've increased the log level to 3 in the hope that 
>we'll get more information the next time Samba goes wild.
>Our smb.conf file isn't complicated - the global section looks like this:
>   workgroup = ICD
>   netbios name = VALHALLA
>   security = domain
>   password server = *
>   wins server = nn.nn.nn.nn mm.mm.mm.mm
>   server string = Linux ClearCase Server %v %h
>   log file = /var/log/samba/%m.log
>   log level = 3
>   max log size = 4000
>   username map = /etc/samba/smbusers
>   read raw = no
>   oplocks = no
>   kernel oplocks = no
>   level2 oplocks = no
>   create mask = 0774
>   directory mask = 0775
>   map archive = No
>   preserve case = yes
>   deadtime = 0
Is this by any chance with the 3.0.7-1.3E.1 RH Samba update that was 
just recently released or one of the previous 3.0.7 RH packages?


