Oplock breaks on high loads

Fri Mar 15 05:48:05 GMT 2002

Steve Cleveland wrote:
> I'm running Samba 2.2.2 on Solaris 8 in a college environment with
> approximately 200 PC workstations.  The Solaris server is a Sunfire 280R
> 2x750MHz, and 4GB RAM.

	Hmmn, you can probably transfer something on 
	the order of 159 MB/S, which is reasonable
	as long as the students are doing editing
	and the like, not running mass compiles.
 
> The end of the term is here, and the labs are near capacity for most of
> the day.  As the usage has gone up, we've started to experience a lot of
> problems.  I've been tracking the number of smbd processes and as they
> reach 300+, it suddenly jumps to ~400, the load on the server jumps to
> around 70 (from 2 or 3) and all of the processes disconnect from the
> parent.

	Yoiks, something's building up: you should
	average around 200-odd processes, one per
	logged-on user.

	In a lab/office environment, you should
	have 
		keepalive = 3600
		dead time = 10
	set, to cause Samba to detect dead
	or inactive or disconnected clients,
	and clean them up.

	That it isn't indicates there is an
	underlying problem, that "hits the wall"
	when the load gets high.

	As we know there is a problem, start with
		keepalive = 30
	and if the behavior improves, set it done
	to 3600.

 
> Mar 14 14:00:47 stak smbd[23433]: [ID 702911 local7.error] reply_lockingX:
> Error : oplock break from client for fnum = 7937 and no oplock granted on
> this file (stella/mcsm1 fit/Mass Transfer Limitiation/BUtane
> only/BUTONLY.TXT).

	One of the clients has failed to get an oplock,	
	which can be harmless **if** more than one person
	is supposed to be editing/reading the same file.

> Mar 14 14:00:35 stak smbd[24115]: [ID 702911 local7.error] locking :
> delete_fn. LOGIC ERROR ! Entry for pid 23930 and it no longer exists !

	A smbd has exited unexpectedly, and another
	noticed and logged it.
> 
> Mar 14 14:00:35 stak smbd[24105]: [ID 702911 local7.error] PANIC:
> request_oplock_break: no fsp found for our own oplock

	looks like an internal error

> Mar 14 15:00:03 stak smbd[29962]: [ID 702911 local7.error]
> request_oplock_break: PANIC : breaking our own oplock requested for dev =
> 30c0432, inode = 570228,

	internal error, of the "can't happen" sort.
> 
> Mar 14 15:00:40 stak smbd[2305]:  [ID 702911 local7.error]
> yield_connection: tdb_delete for name failed with error Record does not
> exist.
	internal error

	First, try the options.

	Second, try
		netstat -s | egrep -i 'coll|defer|err|drop|reset'
	You should get something like
elsbeth> netstat -s | egrep -i 'coll|defer|err'  
        rawipInDatagrams    = 11504     rawipInErrors       =     0
        rawipInCksumErrs    =     0     rawipOutDatagrams   =     3
        rawipOutErrors      =     0
        udpInDatagrams      =2719515    udpInErrors         =     0
        udpOutDatagrams     =2653659    udpOutErrors        =     0
        ipInReceives        =4064815    ipInHdrErrors       =     0
        ipInAddrErrors      =     0     ipInCksumErrs       =     0
        tcpInErrs           =     0     udpNoPorts          =584671
        udpInCksumErrs      =     0     udpInOverflows      =     0
        ipv6InReceives      =     0     ipv6InHdrErrors     =     0
        ipv6InTooBigErrors  =     0     ipv6InNoRoutes      =     0
        ipv6InAddrErrors    =     0     ipv6InUnknownProtos =     0
        udpInCksumErrs      =     0     udpInOverflows      =     0
ICMPv4  icmpInMsgs          = 11504     icmpInErrors        =     0
        icmpInCksumErrs     =     0     icmpInUnknowns      =     0
        icmpOutDrops        =584660     icmpOutErrors       =     0
ICMPv6  icmp6InMsgs         =     0     icmp6InErrors       =     0
        icmp6OutMsgs        =     0     icmp6OutErrors      =     0
 
	Any errors are bad, 
	all resets are bad
	tcp drops are abd, other don't matter
	defers are suspicious (as a % of packets sent)
	collisions ditto

	One well-known cause is a flaky hub/router, as well
	as bad and mis-set ethernet cards in the PCs.

--dave


-- 
David Collier-Brown,           | Always do right. This will gratify 
Performance & Engineering      | some people and astonish the rest.
Americas Customer Engineering, |                      -- Mark Twain
(905) 415-2849                 | davecb at canada.sun.com