fcntl F_SETLKW64 failing on Solaris

Wed Jan 9 16:03:12 GMT 2002

On Thu, Jan 10, 2002 at 10:06:11AM +1100, Tristan Ball wrote:
> On a vaguely related note:
> 
> We're now running 2.2.3pre from cvs, on solaris 8. The one samba instance
> provides 3 different servers, via netbios aliases and included config files.
> That means we regularly get between 400-700 active samba processes.
> 
> As some of you are aware we have had quite a few troubles in the past, with
> fnctl locks on the TDB's, which seems to manifest as a very high contention
> rate on one or more kernel mutexes, followed by a graduall descent into
> madness - corrupt tdb's, and recently the semephore timout problem Romeril
> was reporting.
> 
> On our previous release, 2.2.1a+patches, when mutex contention was high, the
> load average sky rocketed, and the box crawled. This was
> compiled --with-spinlocks, and using nanosleep rather than sched yield.
> On our current release, 2.2.3pre, again I've used --with-spinlocks, but this
> time I tried sched_yield. We had a short period yesturday when we were
> getting 4-500 blocks on kernel mutexes per second (normal is about 100 for
> us). This time the load average was stable at 3 (a little higher than normal
> for us, but not much), and the box stayed responsive. Interestingly, the
> spike occured while I had debug level =3. Reducing that to 1, and sighuping
> samba returned the machine to normal.

Just a warning - using --with-spinlocks can be very dangerous. If a smbd
process abends holding a spinlock, all the smbd's will have to be restarted
and the tdb cleared to remove it.

The same won't happen with fcntl tdb's.

Just a little touch of paranoia.... :-).

Jeremy.