Samba 2.2.2 oplock problem
Lee Liolios
liolios at buffalo.edu
Fri Jan 4 11:13:05 GMT 2002
Hi Jeremy,
We have been using Samba 1.9.18p10 in production here at UB for over 2
1/2 years with no problem. He use it to give Windows user's access to
DFS space via DCE username/password. We have four Sun's running Samba
that we recently upgraded from Solaris 8 (from 2.6) and DCE/DFS 3.1.
We took the opportunity before the semester starts to also upgrade to
Samba 2.2.0. We found the 100% cpu utilization error relating to the
race condition and oplocks, so I upgraded to 2.2.2 and applied your patch.
However, we still are seeing this problem. Perhaps there was another
patch or fix for this I missed on the list?
Any help at all on this would be greatly appreciated.
Thanks
--
Lee Liolios
Programmer/Analyst
SUNY Buffalo
>Hi all,
>
> I think I've finally worked out what is causing the panic
>error messages. This is the :
>
>open_mode_check: Existant process XXXX left active oplock.
>
>messages that people have mainly been seeing on Solaris (but
>occasionally on Linux also).
>
>I think it's a race condition caused by the cleanup code in
>locking/locking.c that ensures the share mode database contains
>no entries from a terminating smbd, and the code in open.c that
>ensures an open file has no exclusive oplock entries left.
>
>It would normally occur with a heavily contended file, the
>scenario looks something like this...
>
>smbd (a) sends client an oplock break message due to open
>reqeusts from smbd's (b) and (c).... (z) - all of which can
>happen concurrently.
>
>The client of smbd (a) fails to respond to the break request
>(happens sometimes, bad cabling, client dead, whatever..).
>
>smbd (a) then decides it's time to exit. In doing so it
>goes through the share mode/open file database, deleting
>records for open files it has. It then does a second traverse
>of the share mode db looking for any records it may have
>missed (that would be a logic error). This second traversal
>is very expansive, and unnessesary (it's been removed in
>the current 2.2 and HEAD CVS code). The whole point is that
>this termination could take a relatively long time, depending
>on the contention on the share mode db (this is the variable
>part which is why it's been impossible to get a reproducible
>test case).
>
>In the mean time, smbd's (b)....(z) are scanning the share
>mode db, waiting for the record that caused them to send the
>oplock break to be removed. Eventually they give up and decide
>to remove the record themselves. Before doing that, currently
>in the 2.2.2 and CVS (2.2 and HEAD) code, they check if the
>process owning that record still exists. If it does, they
>consider it a logic error and terminate themselves. THIS ASSUMPTION
>IS THE FLAW. As noted above, the cleanup process may take a
>relatively long time, and as such it's not an error if the
>process still exists, it's (hopefully) doing it's best to
>cleanup and die.
>
>The following simple patch (already applied to 2.2 and HEAD CVS)
>should apply cleanly to a 2.2.2 source tree, and if this
>assumption is correct, should fix the problem. If the
>case described above occurs, all that should happen now
>is log messages stating
>
>"open_mode_check: Existant process XXXX left active oplock"
>
>which can be treated as a warning rather than a fatal error.
>
>If people who have been suffering from this problem could
>either try the 2.2 CVS or apply this patch to their 2.2.2
>code and test to see if the problems reported are fixed,
>I'd greatly appreciate it.
>
>Thanks,
>
> Jeremy Allison,
> Samba Team.
[...actual patch removed...]
More information about the samba-technical
mailing list