fcntl F_SETLKW64 failing on Solaris

Wed Jan 9 15:10:03 GMT 2002

On a vaguely related note:

We're now running 2.2.3pre from cvs, on solaris 8. The one samba instance
provides 3 different servers, via netbios aliases and included config files.
That means we regularly get between 400-700 active samba processes.

As some of you are aware we have had quite a few troubles in the past, with
fnctl locks on the TDB's, which seems to manifest as a very high contention
rate on one or more kernel mutexes, followed by a graduall descent into
madness - corrupt tdb's, and recently the semephore timout problem Romeril
was reporting.

On our previous release, 2.2.1a+patches, when mutex contention was high, the
load average sky rocketed, and the box crawled. This was
compiled --with-spinlocks, and using nanosleep rather than sched yield.
On our current release, 2.2.3pre, again I've used --with-spinlocks, but this
time I tried sched_yield. We had a short period yesturday when we were
getting 4-500 blocks on kernel mutexes per second (normal is about 100 for
us). This time the load average was stable at 3 (a little higher than normal
for us, but not much), and the box stayed responsive. Interestingly, the
spike occured while I had debug level =3. Reducing that to 1, and sighuping
samba returned the machine to normal.

I don't think this is directly related to the solaris fnctl problem, but
I've found that the more I can reduce the contention on the mutexes, the
better samba behaves. Going to 2.2.3pre, and moving some moderately cpu
intensive processes off the CPU have improved things immensely for us.

T.

----- Original Message -----
From: "Jeremy Allison" <jra at samba.org>
To: <David.Collier-Brown at Sun.COM>
Cc: "Romeril, Alan" <a.romeril at ic.ac.uk>; <samba-technical at samba.org>
Sent: Wednesday, January 09, 2002 7:38 AM
Subject: Re: fcntl F_SETLKW64 failing on Solaris

> On Tue, Jan 08, 2002 at 02:42:52PM -0500, David Collier-Brown wrote:
> >
> > I suspect we're seeing a Solaris-specific bug, but without
> > the errno I'm puzzled as to what we should do about it.
> > ENOLCK would be easier to deal with than EOVERFLOW, and
> > harder than EIO ir EINTR...
>
> Yeah I'm concerned about that, as people do use Samba on large
> Solaris servers mainly. Solaris fcntl lock code has had historical
> problems with mmapped files (doesn't work over NFS as I recall)
> and we really need this to work right for tdb.
>
> Any more info would help, even if it just gets us a workaround to
> a Solaris bug (not that I'm implying Solaris has a bug here, it's
> just a possibility :-) :-).
>
> Jeremy.
>
>