[Samba] warward smbd processes

Nathan Vidican nvidican at wmptl.com
Tue Oct 11 13:19:50 GMT 2005


For several months now, we've been having smbd processes which 'lock' 
and escalate to 99% CPU utilization effectively locking the end-user out 
entirely and hanging their client machines. Almost exclusively happening 
while the user is saving either MS Word or Excel file, and even more 
specifically only narrowed to a couple of users.

We've tried various patches offered by members of the samba team here 
through the list, which over the past few versions of samba have helped 
greatly (thanx guys), but to no avail has the problem ever ceased to 
exist. Admittedly, the state of our network was rather poor and 
ineffective for debugging purposes.

Recently, we moved to change that when a nice thunderstorm took out 
three of our existing switches. We have since replaced the network 
hardware in both the main server room, and the network branch with which 
all the users encountering this problem exist. The network now consists 
of NetGear Layer 2 Managed switches, (1- 12 PORT SFP switch in the 
server room operating at 1000Mbit full duplex with 2 independant fiber 
links to (2) 24 Port 10/100 switches with 1000mbit fiber uplinks via 
GBICs). Figuring that perhaps the issue was indeed out network 
disconnecting users, and thus leaving a stale smbd processes locking the 
file they were using and escalating to 99% cpu in some way-ward loop of 
code somewhere...

Now, things are running a lot faster, but the problem seems to be 
getting trickier. We're having users encounter a similar problem as to 
before, except now the first smbd process belonging to a specific client 
becomes locked without escalating to 100% cpu utilization. Essentially I 
get something similar to this:

   (wmpoff25 is the machine/client in question in this case, user usually
    calls to say 'my machine is locked up'):

wmptwo# /server/bin/samba-3.0.13/bin/net status sessions | grep wmpoff25
10135   cboakes       shop          wmpoff25     (10.0.0.27)
10015   cboakes       shop          wmpoff25     (10.0.0.27)

A simple 'kill 10015' does nothing, repeat... nothing, finally, 'kill -9 
10015' , and poof - the end user's system comes back to them and all 
runs well until the next time they call us.

The problem therefore the same as before, and our resolution much the 
same, except that now the process does not climb to high cpu utilization.

In my dispair I started to think perhaps the issue is with the LDAP 
tree, noting that the slapd process cannot exit cleanly on our systems, 
(seems to be a bug in openldap/freebsd-amd64/threads), so I've since 
re-compiled ldap and re-created the tree from a 'slapcat' backup using a 
copy of ldap which is not utilizing threads. This cripples our setup a 
little, as slurp will not compile/run without threading support - to say 
nothing of the obvious performance issues in not using a threaded 
version of slapd. But for now, at least slapd starts,  runs, and exits 
cleanly. We depend on ldap not only for our samba user database, but 
also for our unix user base via pam_ldap and nss_ldap to multiple 
servers and even a few *_nix workstations.

So here I am again, at a loss. I tried compiling samba-3.0.20, and all 
compiles well, smbd starts, but nobody's home for some reason. 
Admittedly have not had the time nor capability to properly debug or 
roll-out 3.0.20, because these servers are in production environment now 
running slightly hacked copy of 3.0.13. I cannot stop our systems from 
running to 'try' them with 3.0.20, and have not a test machine capable 
of running freebsd/amd64 which is not already in use. Our servers are 
all dual AMD Opteron based boxes with dual homed gigabit ethernet 
connections (one link to the main network, and one amongst each other).

Aside from 'try 3.0.20', any suggestions someone may offer? I will be 
setting up a test server shortly and trying to get 3.0.20 to run cleanly 
on it, but I figured it may be worth posting now to see if anyone had 
some other ideas. Any and all constructive feedback would be greatly 
appreciated.

We're running FreeBSD 5.3-RELEASE/AMD64, with OpenLDAP 2.2.26 (no thread 
support), and samba-3.0.13 (with one server running 3.0.7 for print 
server with no errors thus far).




-- 
Nathan Vidican
nvidican at wmptl.com
Windsor Match Plate & Tool Ltd.
http://www.wmptl.com/


More information about the samba mailing list