100% cpu utilization
David Collier-Brown
davecb at canada.sun.com
Thu Nov 1 07:55:04 GMT 2001
Scott Moomaw wrote:
> We continue to experience problems with the latest Samba CVS on Solaris 8
> consuming 100% of cpu utilization. The problem seems to occur when the
> system is experiencing heavy load.
You've captured soem good symptoms: they may
suffice....
I can volunteer some help at the system side if
what you've captured isn't sufficient.
I've attached a program called pry, which simply
spits out stats from /proc, but it's actually
best to start with sar.
The command is
sar -o name.raw 10 30 && sar -A -f name.raw >name.txt
which captures system samples every 10 seconds for 5
minutes, without using hardly any cpu, then runs the
result through the formatter, which does use enough cpu
to affect the stats.
> Using top, we note 100% CPU utilization. There are approx 200 smb
> processes with a handful of the processes in a runable state using up to
> 2% of CPU each. It's hard to get details on one of the problematic
> processes because as quickly as we can identify them, they disappear.
> Using truss, I find most processes in an expected poll state, but when I
> can catch one of the problem processes I see bunches of fcntl with calls
> like kill(20759, SIG#0) interspersed.
Try to capture one of these with pry <pid>, which
spits out the resources used by the program
from the time it started to the point where the
sample was taken.
> I did manage to grab a core of one
> of these processes and have included a stack backtrace below.
This should be seriously useful... I'm
not a crashdump person, but there's a book
on the subject (:-)) http://www.sun.com/books/catalog/Drake/
> Here's a snippet from log.smbd in the time period leading up to the
> problem in case it is useful
>
> [2001/10/31 11:49:20, 0, pid=414] lib/util_sock.c:get_socket_addr(1038)
> getpeername failed. Error was Transport endpoint is not connected
Hmmn: that looks like a spin on a broken connection.
Have you set keepalive and dead time?
I mildly recommend
dead time = 10 # Close after 10 minutes inactivity
keepalive = 30 # Check if client is dead after
# 30 seconds of inactivity
--dave
--
David Collier-Brown, | Always do right. This will gratify
Americas Customer Engineering, | some people and astonish the rest.
SunPS Integration Services. | -- Mark Twain
(905) 415-2849 | davecb at canada.sun.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pry.tar
Type: application/x-tar
Size: 59392 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20011101/0c60e35e/pry.tar
More information about the samba-technical
mailing list