PANIC: open_mode_check:/CPU hog: still an issue in 2.2.5 on
solaris?
Jeff Mandel
jeff.mandel at probes.com
Thu Aug 29 13:16:00 GMT 2002
On solaris 8, smbds are growing to max out CPU.
My earlier note of the PANIC was the result of the kill -9 on the
offending smbd, not from before.
Truss on spinning process showed continuous
sigprocmask(SIG_SETMASK, 0xFED2AD70, 0xFFBEDEC0) = 0
lwp_kill(1, SIGUSR1) = 0
sigprocmask(SIG_SETMASK, 0xFFBEDEC0, 0x00000000) = 0
Received signal #16, SIGUSR1 [caught]
siginfo: SIGUSR1 pid=5978 uid=0 code=-1
setcontext(0xFFBED978)
sigprocmask(SIG_SETMASK, 0xFED2AD70, 0xFFBEDE60) = 0
lwp_kill(1, SIGUSR1) = 0
sigprocmask(SIG_SETMASK, 0xFFBEDE60, 0x00000000) = 0
Received signal #16, SIGUSR1 [caught]
siginfo: SIGUSR1 pid=5978 uid=0 code=-1
setcontext(0xFFBED8F8)
In source/smbd/server.c I noticed
/* POSIX demands that signals are inherited. If the invoking process has
* these signals masked, we will have problems, as we won't recieve
them. */
BlockSignals(False, SIGHUP);
BlockSignals(False, SIGUSR1);
Could there be a problem with the way samba is started?
In 2.2.5 there is a comment in oplock.c
<snip>
if (timeout <= 1) {
smb_read_error = READ_TIMEOUT;
return False;
}
/* Not a kernel interrupt - could be a SIGUSR1 message. We
must restart. */
/* We need to decrement the timeout here. */
timeout -= ((time(NULL) - starttime)*1000);
if (timeout < 0)
timeout = 1;
DEBUG(5,("receive_local_message: EINTR : new timeout %d
ms\n", timeout));
continue;
Per the previous postings on signal blocking, I upgraded our nss_ldap on
solaris 8 to v201. We missed the v181 issues as our last previous
version was 172, and this was fixed in v200 according to Luke, so I'm
not so sure it's related to nss_ldap.
It looks to me like smb is getting a kill, but ignoring.
I have the lsof for that process as well. What could I look for those?
Any ideas?
Jeff
More information about the samba-technical
mailing list