100% cpu utilization

Thu Nov 1 15:59:03 GMT 2001

On Wed, Oct 31, 2001 at 04:57:22PM -0500, Scott Moomaw wrote:
> We continue to experience problems with the latest Samba CVS on Solaris 8
> consuming 100% of cpu utilization.  The problem seems to occur when the
> system is experiencing heavy load.
> 
> Here's what we see of the problem.
> 
> Using top, we note 100% CPU utilization.  There are approx 200 smb
> processes with a handful of the processes in a runable state using up to
> 2% of CPU each.  It's hard to get details on one of the problematic
> processes because as quickly as we can identify them, they disappear.
> Using truss, I find most processes in an expected poll state, but when I
> can catch one of the problem processes I see bunches of fcntl with calls
> like kill(20759, SIG#0) interspersed.  I did manage to grab a core of one
> of these processes and have included a stack backtrace below.
> 
> __fcntl(0xd,0x23,0x80474c4) + c
> fcntl(0xd,0x23,0x80474c4,0x2c13c) + 1f
> tdb_brlock(0x81ec8f8,0xa8,0x2,0x23,0x0,0xdfa65b2b,0xd,0x23) + 68
> tdb_lock(0x81ec8f8,0x0,0x2,0x16,0x1,0x81e9088) + a2
> tdb_chainlock(0x81ec8f8,0x81e8f70,0xc,0x810d0e8,0x81e8f70,0xc) + 2a
> delete_fn(0x81ec8f8,0x81e8f70,0xc,0x81e8f7c,0x8a,0x804764c,0x2,0x83) + 3d
> tdb_traverse(0x81ec8f8,0x810d0d0,0x804764c,0x806e467) + 9b
> locking_end(0xdfa83000,0x80476c0,0x0,0x0,0x0,0x804768c,0x8047690,0x806da8b)
> + 47
> exit_server(0x8144c80,0x0,0x0,0x0,0x0,0x8047abc) + 160
> dflt_sig(0xf,0x0,0x80476c0) + 13
> sigacthandler() + 25
> dbg_mask(0x15,0x8047a3c,0x0,0x0,0x8047a34,0x5) + 2044f887
> sys_select(0xd,0x8047a3c,0x8047a34,0x80a12f9) + c7
> receive_message_or_smb(0x8210991,0x10040,0xea60,0x80a25e0) + 169
> smbd_process(0xdfbed1e8,0x8047b10,0x8047bf8,0x210,0xdfa0d67f,0xdfa0d6a3,0xdfbe13
> 7f,0x8047b10,0x8b,0x1,0x8047b48,0x806d94f,0x2,0x8047b54,0x8047b60,0x8144c30)
> + 11e
> main(0x2,0x8047b54,0x8047b60) + 6d9
> 
> Here's a snippet from log.smbd in the time period leading up to the
> problem in case it is useful
> 
> [2001/10/31 11:49:20, 0, pid=414] lib/util_sock.c:get_socket_addr(1038)
>   getpeername failed. Error was Transport endpoint is not connected
> [2001/10/31 11:49:20, 0, pid=414] lib/util_sock.c:get_socket_addr(1038)
>   getpeername failed. Error was Transport endpoint is not connected
> [2001/10/31 11:49:20, 0, pid=414] lib/access.c:check_access(322)
> [2001/10/31 11:49:20, 0, pid=414] lib/util_sock.c:get_socket_addr(1038)
>   getpeername failed. Error was Transport endpoint is not connected
>   Denied connection from  (0.0.0.0)
> [2001/10/31 11:49:20, 1, pid=414] smbd/process.c:process_smb(850)
> [2001/10/31 11:49:20, 0, pid=414] lib/util_sock.c:get_socket_addr(1038)
>   getpeername failed. Error was Transport endpoint is not connected
>   Connection denied from 0.0.0.0
> [2001/10/31 11:49:20, 0, pid=414] lib/util_sock.c:write_socket_data(542)
>   write_socket_data: write failure. Error = Broken pipe
> [2001/10/31 11:49:20, 0, pid=414] lib/util_sock.c:write_socket(566)
>   write_socket: Error writing 5 bytes to socket 5: ERRNO = Broken pipe
> [2001/10/31 11:49:20, 0, pid=414] lib/util_sock.c:send_smb(730)
>   Error writing 5 bytes to client. -1. (Broken pipe)
> [2001/10/31 11:49:29, 0, pid=423] lib/util_sock.c:get_socket_addr(1038)
>   getpeername failed. Error was Transport endpoint is not connected
> 
> Any insight as to this problem?

Can you get me a debug level 10 from start to finish of a smbd
suffering from this ?

It seems to be solaris specific....

Jeremy.