FCNTL on Solaris

David Collier-Brown davecb at canada.sun.com
Mon Apr 22 10:05:01 GMT 2002


>  Tridge found the (already noted) related bug on our system and conceded it
> was a design flaw. Apparently each new smbd process that starts, does a
> quick traversal of the tdb databases to clean out any stale entries, and on
> Solaris, these are taking too long. 

	I've found a bunch of fixed bugs on fcntl performance,
	implying it's been even slower in the past (:-))

>Ok - discussed this with Andrew last night. It seems that this is only
>a problem on Solaris. Solaris seems to have *serious* issues with fcntl
>locks with multiple processes contending for locks. No other system we
>run on seems to have this problem (they have their own problems :-).

	At the expense of not addressing the Sun side of the
	problem, might I suggest that validation operations
	shouldn't lock?  
	
	Throwing my mind into a past life with safety-critical	
	real-time, I opine that the check without locks will
	1) succeed in bounded time dependent on the number 
		of structures traversed & checked
	2) fail because the structures are invalid (in this case
		stale) in bounded time, at which point one
		chooses to take a lock and remove them.
	3) fail in bounded time because the structures were
		changed by a program using locking, and the 
		non-locked program is seeing changing data. 
		In this case we elect to try to take a lock,
		fail because it's already held, wait interminably
		for it to complete, get the lock, and
		a) find it's done and exit
		b) find it still needs to be done and do it.
	The third is interesting because the other threads or
	processes are delaying us some amount before we get to
	do any work.  This, you might imagine, is a problem when
	you try to demonstrate correctness within lime limits (;-))

	I haven't looked at the code, but if it uses F_SETLKW
	you might want to do a trylock first, implemented via
	F_GETLK or F_SETLK, as this would allow subsequent
	processes to continue, knowing that someone's fixing
	the tdb, and that they can access it later using the
	normal locking regime.

> >
> >Dave CB - can you investigate this within Sun please. This is a *critical*
> >part of Samba, we may have to look into a solaris-specific workaround and
> >this would be bad.

	Bad is an understatement...
 
--dave
-- 
David Collier-Brown,           | Always do right. This will gratify 
Performance & Engineering      | some people and astonish the rest.
Americas Customer Engineering, |                      -- Mark Twain
(905) 415-2849                 | davecb at canada.sun.com




More information about the samba-technical mailing list