OPLOCK and locking problems: (Resource deadlock avoided)

Thu Mar 27 20:34:13 GMT 2003

On Sun, Mar 23, 2003 at 02:23:45PM +1100, Andrew Bartlett wrote:
> Earlier this week, I had a serious meltdown of Samba HEAD at my site. 
> (A < 100 concurrent user, domain logon and homedir setup).
> 
> All the users share a single mandatory profile, which they think they
> can write two, but can't.  (due to file permissions).  They think they
> can due to the use of 'vfs_fake_perms.so'.  In any case, no matter what
> the client thinks, I'm told this should not happen:
> 
> I've attached the first 6 mins on the log, but by the time it got to 11
> AM I'm told it got impossible to use the system.  As smbds got caught up
> in waiting for oplocks, I think the clients decided to reconnect.  This
> created even more load, and by 12PM when I got onto the system, there
> were way more smbd processes than machines to account for them.
> 
> The load at 12PM was 20, and just logging into the system with SSH took
> *ages*.
> 
> Unfortunately I was unable to get an strace or gdb the culprit, as I had
> to get the system back up and going again.
> 
> There is a slight possibility of tdb corruption (I should have removed
> the locking tdb after the last crash), but no segfaulting processes. 
> (This has occurred before, but I had blamed that).
> 
> By the end of the logfile, we have multiple smbds all sending oplock
> replies to processes that don't expect them, connections being reset and
> all hell breaking loose...
> 
> Personally, I suspect a tdb bug as the root cause, but our UDP based
> oplock handling can't get off the hook either.

Are you running the Solaris kernel scalabel-fcntl patch ? If not,
that was your problem, not the Samba code.

Jeremy.