HELP: Connections dropping whilst processes increasing.

Mon Nov 22 19:38:32 GMT 1999

On Sat, 20 Nov 1999, Cliff Green (green at UMDNJ.EDU) wrote:

> Did anyone ever give an answer for this problem? 

	I'm getting EXACTLY the same problem here, with Red Hat 6.0 + custom
Linux Kernel 2.2.13 (with e2comp patch basically).

> We've been experiencing something very similar to Martin Rootes' problem, 
> on an HP9000 K-series server, with anywhere from hundreds to thousands of 
> extra, unkillable smbd processes. The odd thing is, the system load goes 
> *very* high, but it doesn't seem to affect anything other than further 
> smbd services, including preventing successful logons. 

	Same thing here - I can do whatever I want to, except to use Samba
services. Telnet, httpd, LDAP, etc., everything but Samba works OK.
Includes, the problem is easily reproductible in one station: it's only
neccessary to open a Word document, make one or two modifications and save
it twice. The first save works ok, the second locks the machine and turns
the user's original smbd mad, making the server spawn two or three other
process for the same user.

> It's odd - it only happens on that one server (we run Samba on five 
> production servers), and there are few differences between that host and 
> the others. As you can imagine, I really need to determine if our problem 
> is that there's something wrong with Samba, or if this is due to either 
> the other processes on that server or something different about the 
> clients that predominantly use that server.

	How many users use this specific server and how do they use it? What
applications are involved in client side (Word, Excel, xBASE apps...) and
in server side (daemons?) ?

> Very unfortunately, the only way to get rid of those hundreds to thousands 
> of extra processes is to restart the server. An increasingly unacceptable 
> solution. 

	I repeat, it's EXACTLY the same way here. Unacceptability considerations
included. =:-0

> My management and the support staffer on that campus believe that Samba is 
> the problem, because it displays this behavior (difficulty logging in, and 
> enormous numbers of unkillable smbd processes). I believe it's something 
> else, but need to prove it.

	Hmmm... How is this server connected to the stations that show the
problem? Here I think that our problem may be our switch (a 3Com SuperStack
1000 with OLD firmware and low-capacity buffers, since it's a workgroup
switch and not a backbone switch). I say this because I'm observing
*collisions* in ports that are reserved to the *server* and *workstations*
(no hubs involved). 

> Let's see, the only configuration options were --prefix, --with-quotas, 
> and --with-mmap (which I guess we'll stop using Real Soon). 

	If I'm not mistaken, mmap suport is disabled by default in the current
(and not-so-current) versions of Samba, so I think it's not an issue
(unless you have enabled it explictly).

> The logon script mounts the user's home directory, a shared directory, 
> sets the time, and some antiviral housekeeping. 

	I don't have logon scripts here. I map the drives using Network
Neighborhood. Hmmm, we also have antiviral software running (McAfee
ViruScan), what's yours?

> Help! 
> 
> Anything anyone's found or any insights will be helpful. 

	I've found "window frozen" problems and acknowledge-time problems ("acks
too long") between station and server. In the first case, this is a signal
of buffer exhaustion and so I'll be setting up separate switchless network
for us in a separate interface on the server, plus the "usual" network
interface that will remain connected to the switch. I'll put one smbd
listening in each interface. Thus if the smbd linked to the interface
connected to the switch locks I'll know that the issue is the switch issue.

> log.smb on that campus shows *no* entries for "connect to 
> service netlogon", but many "closed connection to service netlogon", which 
> should not be happening. 

	I'm not sure, but it *seemed* to be happening here too - I'll check out.

> On the other hand, that server began running both Oracle and OpenView for 
> network monitoring and management a few months before these problems 
> started to appear. 

	I don't think this is the problem, as I don't run neither of these here
and I have the problem too. BUT...

	...humm, I'm running snmpd here and I think you're doing this too, as I
think this server of yours is SNMP-manageable. Or not? I say this because
I'm running snmpd here (and actually it's basically useless).

> I didn't want to shower you all with log details, my smb.conf file, or the 
> logonscript (of course, I'll provide info if it'll help) - but can 
> *anyone* provide some advice, insight, or <gasp> solutions? 

	I'm looking for solutions, also. Unfortunatley, I still don't have any
concrete answers. But if you could check out these points would be
interesting to see whether there are (or not) other similarities (besides
the problem itself).

	P.

-- 
"The one that doesn't run the risk doesn't snap"

(Millôr, "Lições de Inglês Audiovisual", Pasquim nº117)