100% cpu utilization

David Collier-Brown davecb at canada.sun.com
Thu Nov 1 07:55:04 GMT 2001


Scott Moomaw wrote: 
> We continue to experience problems with the latest Samba CVS on Solaris 8
> consuming 100% of cpu utilization.  The problem seems to occur when the
> system is experiencing heavy load.

	You've captured soem good symptoms: they may
	suffice....

	I can volunteer some help at the system side if
	what you've captured isn't sufficient.
	I've attached a program called pry, which simply
	spits out stats from /proc, but it's actually
	best to start with sar.

	The command is 
		sar -o name.raw 10 30 && sar -A -f name.raw >name.txt
	which captures system samples every 10 seconds for 5 
	minutes, without using hardly any cpu, then runs the 
	result through the formatter, which does use enough cpu
	to affect the stats.


> Using top, we note 100% CPU utilization.  There are approx 200 smb
> processes with a handful of the processes in a runable state using up to
> 2% of CPU each.  It's hard to get details on one of the problematic
> processes because as quickly as we can identify them, they disappear.
> Using truss, I find most processes in an expected poll state, but when I
> can catch one of the problem processes I see bunches of fcntl with calls
> like kill(20759, SIG#0) interspersed.  

	Try to capture one of these with pry <pid>, which
	spits out the resources used by the program
	from the time it started to the point where the
	sample was taken.

> I did manage to grab a core of one
> of these processes and have included a stack backtrace below.

	This should be seriously useful...  I'm
	not a crashdump person, but there's a book	
	on the subject (:-)) http://www.sun.com/books/catalog/Drake/
 
> Here's a snippet from log.smbd in the time period leading up to the
> problem in case it is useful
> 
> [2001/10/31 11:49:20, 0, pid=414] lib/util_sock.c:get_socket_addr(1038)
>   getpeername failed. Error was Transport endpoint is not connected

	Hmmn: that looks like a spin on a broken connection.
	Have you set keepalive and dead time?

	I mildly recommend 
		dead time = 10  # Close after 10 minutes inactivity
		keepalive = 30  # Check if client is dead after
				# 30 seconds of inactivity
	

--dave
-- 
David Collier-Brown,           | Always do right. This will gratify 
Americas Customer Engineering, | some people and astonish the rest.
SunPS Integration Services.    |                      -- Mark Twain
(905) 415-2849                 | davecb at canada.sun.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pry.tar
Type: application/x-tar
Size: 59392 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20011101/0c60e35e/pry.tar


More information about the samba-technical mailing list