[Samba] warward smbd processes
Nathan Vidican
nvidican at wmptl.com
Tue Oct 11 13:19:50 GMT 2005
For several months now, we've been having smbd processes which 'lock'
and escalate to 99% CPU utilization effectively locking the end-user out
entirely and hanging their client machines. Almost exclusively happening
while the user is saving either MS Word or Excel file, and even more
specifically only narrowed to a couple of users.
We've tried various patches offered by members of the samba team here
through the list, which over the past few versions of samba have helped
greatly (thanx guys), but to no avail has the problem ever ceased to
exist. Admittedly, the state of our network was rather poor and
ineffective for debugging purposes.
Recently, we moved to change that when a nice thunderstorm took out
three of our existing switches. We have since replaced the network
hardware in both the main server room, and the network branch with which
all the users encountering this problem exist. The network now consists
of NetGear Layer 2 Managed switches, (1- 12 PORT SFP switch in the
server room operating at 1000Mbit full duplex with 2 independant fiber
links to (2) 24 Port 10/100 switches with 1000mbit fiber uplinks via
GBICs). Figuring that perhaps the issue was indeed out network
disconnecting users, and thus leaving a stale smbd processes locking the
file they were using and escalating to 99% cpu in some way-ward loop of
code somewhere...
Now, things are running a lot faster, but the problem seems to be
getting trickier. We're having users encounter a similar problem as to
before, except now the first smbd process belonging to a specific client
becomes locked without escalating to 100% cpu utilization. Essentially I
get something similar to this:
(wmpoff25 is the machine/client in question in this case, user usually
calls to say 'my machine is locked up'):
wmptwo# /server/bin/samba-3.0.13/bin/net status sessions | grep wmpoff25
10135 cboakes shop wmpoff25 (10.0.0.27)
10015 cboakes shop wmpoff25 (10.0.0.27)
A simple 'kill 10015' does nothing, repeat... nothing, finally, 'kill -9
10015' , and poof - the end user's system comes back to them and all
runs well until the next time they call us.
The problem therefore the same as before, and our resolution much the
same, except that now the process does not climb to high cpu utilization.
In my dispair I started to think perhaps the issue is with the LDAP
tree, noting that the slapd process cannot exit cleanly on our systems,
(seems to be a bug in openldap/freebsd-amd64/threads), so I've since
re-compiled ldap and re-created the tree from a 'slapcat' backup using a
copy of ldap which is not utilizing threads. This cripples our setup a
little, as slurp will not compile/run without threading support - to say
nothing of the obvious performance issues in not using a threaded
version of slapd. But for now, at least slapd starts, runs, and exits
cleanly. We depend on ldap not only for our samba user database, but
also for our unix user base via pam_ldap and nss_ldap to multiple
servers and even a few *_nix workstations.
So here I am again, at a loss. I tried compiling samba-3.0.20, and all
compiles well, smbd starts, but nobody's home for some reason.
Admittedly have not had the time nor capability to properly debug or
roll-out 3.0.20, because these servers are in production environment now
running slightly hacked copy of 3.0.13. I cannot stop our systems from
running to 'try' them with 3.0.20, and have not a test machine capable
of running freebsd/amd64 which is not already in use. Our servers are
all dual AMD Opteron based boxes with dual homed gigabit ethernet
connections (one link to the main network, and one amongst each other).
Aside from 'try 3.0.20', any suggestions someone may offer? I will be
setting up a test server shortly and trying to get 3.0.20 to run cleanly
on it, but I figured it may be worth posting now to see if anyone had
some other ideas. Any and all constructive feedback would be greatly
appreciated.
We're running FreeBSD 5.3-RELEASE/AMD64, with OpenLDAP 2.2.26 (no thread
support), and samba-3.0.13 (with one server running 3.0.7 for print
server with no errors thus far).
--
Nathan Vidican
nvidican at wmptl.com
Windsor Match Plate & Tool Ltd.
http://www.wmptl.com/
More information about the samba
mailing list