fcntl F_SETLKW64 failing on Solaris

Wed Jan 9 16:22:02 GMT 2002

> I am very interested in these results. When we first implemented the
> spinlock code I never managed to get a high rate of contention (8 way
> server, smbtorture) and so we didnt look too closely at backoff
strategies.
> It would be interesting to do some statistics on how bad the contention
> is (and also what part of samba is responsible for it).

I'm more than happy to help, and I'll give any stats I can. I'll get some
lockstat traces next time it's missbehaving.  As for which part of samba; in
a general sense its the fnctl lock operations on the TDB's.

I have a theory on the sched_yield verses nanosleep too, namely that
nanosleep on solaris actually takes the process of the CPU, and sets an
alarm. I think when nanosleep returns, the process is given an increased
priority and/or gets put on the front of the run queue, whereas sched yield
means you got to the end. If a given process obtains a lock, then finishes
it's timeslice, it goes to the back of the queue. If processess ahead of
that also need the lock, and they use nanosleep, they may well bounce on and
off the cpu, in front of the process which is still holding the lock,
delaying that process getting to the CPU.

I also don't think you would be able to generate the problem with any fewer
than 50 client connections, we don't see problems untill 200+. The other
point, which kinda goes against the generall theory on this, is that more
CPU's may actually make the problem better. My thought is that the
contention gets bad only when processes have to wait for the CPU. The
solaris scheduler is a madly dynamic thing, which adjusts priorities all
over the place. One of the times it will do that is when a process recieves
a packet from the network. Again, if we have 700 connections, there is going
to be a constant stream of packets. If one process gets a lock, then another
gets a packet, the first process will probably get taken off the CPU, still
holding the lock, in favour of the new process with the increased priority.
If new process also trys for the lock, there's going to be trouble, probably
excacerbated by the fact that now there's 50 processes with have recieved
packets and want the CPU, possibly the lock too, and they will all have a
higher priority (temporarily) than the process with the lock. Adding CPU's
would give the machine a better chance of rescheduling the process with the
lock. While theoretically there is a higher chance of contention with more
CPU's, in practice I think the amount of time a process spends holding a
lock, but not actually running, would be greatly reduced.

That's today's theory anyway. :-)

> If you have an SMP machine its also going to make sense to busy loop for a
> bit in case the lock gets dropped on another cpu.

We're running 2 UltraSparc 450mhz CPU's. I agree, but I would say that it
should be a genuine spin, not a nanosleep, for the reasons above.

>
> Do you know if solaris have kernel support for userspace synchronisation?
> (like post/wait or whatever SGI call it)

Yes, via the mutex_* and pthread_mutex_* set of calls. mutex(3thr) on a
solaris 8 machine for details.

T.