Oplock breaks on high loads
David Collier-Brown
davecb at canada.sun.com
Fri Mar 15 05:48:05 GMT 2002
Steve Cleveland wrote:
> I'm running Samba 2.2.2 on Solaris 8 in a college environment with
> approximately 200 PC workstations. The Solaris server is a Sunfire 280R
> 2x750MHz, and 4GB RAM.
Hmmn, you can probably transfer something on
the order of 159 MB/S, which is reasonable
as long as the students are doing editing
and the like, not running mass compiles.
> The end of the term is here, and the labs are near capacity for most of
> the day. As the usage has gone up, we've started to experience a lot of
> problems. I've been tracking the number of smbd processes and as they
> reach 300+, it suddenly jumps to ~400, the load on the server jumps to
> around 70 (from 2 or 3) and all of the processes disconnect from the
> parent.
Yoiks, something's building up: you should
average around 200-odd processes, one per
logged-on user.
In a lab/office environment, you should
have
keepalive = 3600
dead time = 10
set, to cause Samba to detect dead
or inactive or disconnected clients,
and clean them up.
That it isn't indicates there is an
underlying problem, that "hits the wall"
when the load gets high.
As we know there is a problem, start with
keepalive = 30
and if the behavior improves, set it done
to 3600.
> Mar 14 14:00:47 stak smbd[23433]: [ID 702911 local7.error] reply_lockingX:
> Error : oplock break from client for fnum = 7937 and no oplock granted on
> this file (stella/mcsm1 fit/Mass Transfer Limitiation/BUtane
> only/BUTONLY.TXT).
One of the clients has failed to get an oplock,
which can be harmless **if** more than one person
is supposed to be editing/reading the same file.
> Mar 14 14:00:35 stak smbd[24115]: [ID 702911 local7.error] locking :
> delete_fn. LOGIC ERROR ! Entry for pid 23930 and it no longer exists !
A smbd has exited unexpectedly, and another
noticed and logged it.
>
> Mar 14 14:00:35 stak smbd[24105]: [ID 702911 local7.error] PANIC:
> request_oplock_break: no fsp found for our own oplock
looks like an internal error
> Mar 14 15:00:03 stak smbd[29962]: [ID 702911 local7.error]
> request_oplock_break: PANIC : breaking our own oplock requested for dev =
> 30c0432, inode = 570228,
internal error, of the "can't happen" sort.
>
> Mar 14 15:00:40 stak smbd[2305]: [ID 702911 local7.error]
> yield_connection: tdb_delete for name failed with error Record does not
> exist.
internal error
First, try the options.
Second, try
netstat -s | egrep -i 'coll|defer|err|drop|reset'
You should get something like
elsbeth> netstat -s | egrep -i 'coll|defer|err'
rawipInDatagrams = 11504 rawipInErrors = 0
rawipInCksumErrs = 0 rawipOutDatagrams = 3
rawipOutErrors = 0
udpInDatagrams =2719515 udpInErrors = 0
udpOutDatagrams =2653659 udpOutErrors = 0
ipInReceives =4064815 ipInHdrErrors = 0
ipInAddrErrors = 0 ipInCksumErrs = 0
tcpInErrs = 0 udpNoPorts =584671
udpInCksumErrs = 0 udpInOverflows = 0
ipv6InReceives = 0 ipv6InHdrErrors = 0
ipv6InTooBigErrors = 0 ipv6InNoRoutes = 0
ipv6InAddrErrors = 0 ipv6InUnknownProtos = 0
udpInCksumErrs = 0 udpInOverflows = 0
ICMPv4 icmpInMsgs = 11504 icmpInErrors = 0
icmpInCksumErrs = 0 icmpInUnknowns = 0
icmpOutDrops =584660 icmpOutErrors = 0
ICMPv6 icmp6InMsgs = 0 icmp6InErrors = 0
icmp6OutMsgs = 0 icmp6OutErrors = 0
Any errors are bad,
all resets are bad
tcp drops are abd, other don't matter
defers are suspicious (as a % of packets sent)
collisions ditto
One well-known cause is a flaky hub/router, as well
as bad and mis-set ethernet cards in the PCs.
--dave
--
David Collier-Brown, | Always do right. This will gratify
Performance & Engineering | some people and astonish the rest.
Americas Customer Engineering, | -- Mark Twain
(905) 415-2849 | davecb at canada.sun.com
More information about the samba-technical
mailing list