Tue Dec 2 04:10:01 GMT 2003
<waitlocks>/<nonwait> after [processing SMBs has started]
connections.tdb - before 7761/5 after 8/0
brlock.tdb - before 751/1 after 1268/0
locking.tdb - before 3/1 after 5126/1268
The connections fcntls appear to be mostly in the claim_connection call on
line 756 of smbd/server.c - these could be moved elsewhere (see later). More
ironically, we don't make use of the connection counting, so apart from
keeping the size of the tdbs down in the event of SEGVing smbds, this call
is useless to us :o)
The brlocks.tdb fcntls appear to happen in two chunks, the first in
locking_init on line 771 of smbd/server.c (which eventually results in a
traversal of the brlocks.tdb to clear out dead PID locks on startup - these
could probably also be moved elsewhere, see later).
The subsequent brlocks fcntls and the majority of the locking.tdb fcntls
happen whilst processing SMBs (which we think is probably a Win2K profile
load) and are very likely SMB locks translated to real locks - these are of
course "normal" activity, and unlike the TDB traverse activity, you get 8-10
fcntls on the locking/brlock tdbs, then a read() for an SMB, and so on. I
don't attribute the performance problem to these fcntls, since they're fast
and widely spaced, so unlikely to contend.
It seems likely to me that what's happening is that above certain loads,
there are sufficient number of smbd startups that there *is* contention of
the fcntl locks, and the F_SETLKW *are* waiting. This is clearly an
exponential process, like queue filling on network hardware. Moreover, we're
a heavily Win2K-based site - Win2K profile loading has a much higher locking
throughput, and it's possible that the contention is with a currently
running smbd that is loading a profile.
Put another way, two smbds loading different profiles won't contend with
each other, but an smbd loading a profile will contend with an smbd starting
up (and therefore doing a full tdb traverse). However, it seems more likely
to us that smbds starting simultaneously (or close together) cause the bulk
of the problem - thinking about the cross section, a traversing smbd and a
profile-loading smbd have a much lower chance of "collision" than two
If a starting smbd happens to get scheduled ahead of another one, where the
other smbd is further ahead in the tdb traversal, the former will "hit" the
latters locks and the kernel will run the scheduler. If the kernel happens
to schedule a *third* smbd which is behind the second in the traversal, you
get catastrophic failure. As more smbds pile up, the tdb traverse time gets
longer, and the problem gets *worse*.
Under the failure mode, we see tdb_traverse times of 5 seconds or more with
~500 processes, which is a large window in which to hit the problem...
Someone more familiar with Solaris' scheduler, particularly the behaviour
under lock contention, could probably confirm or deny this...
The only solution I can think of is to move the tdb_traverse of the various
locking and connection TDBs into a periodic scan made from either a watching
process or the listening smbd (in daemon mode). This should reduce the load
somewhat, and IIRC was already suggested a month or so back. Sadly, I have
neither the time to make nor the ability to test such large changes...
(there's always one, isn't there :o)
Why is this only happening on Solaris? Maybe it's not, maybe it's just the
Solaris installations tend to be big enough to trigger it. Alternatively,
the relative cost of syscalls and scheduler timeslice lengths might just be
right (or wrong) to trigger the problem on this combination of hardware and
software... who knows (Dave?)
| Phil Mayers |
| Network & Infrastructure Group |
| Information & Communication Technologies |
| Imperial College |
From: jra at samba.org [mailto:jra at samba.org]
Sent: 08 January 2002 20:39
To: David.Collier-Brown at Sun.COM
Cc: Romeril, Alan; samba-technical at samba.org
Subject: Re: fcntl F_SETLKW64 failing on Solaris
On Tue, Jan 08, 2002 at 02:42:52PM -0500, David Collier-Brown wrote:
> I suspect we're seeing a Solaris-specific bug, but without
> the errno I'm puzzled as to what we should do about it.
> ENOLCK would be easier to deal with than EOVERFLOW, and
> harder than EIO ir EINTR...
Yeah I'm concerned about that, as people do use Samba on large
Solaris servers mainly. Solaris fcntl lock code has had historical
problems with mmapped files (doesn't work over NFS as I recall)
and we really need this to work right for tdb.
Any more info would help, even if it just gets us a workaround to
a Solaris bug (not that I'm implying Solaris has a bug here, it's
just a possibility :-) :-).
More information about the samba-technical