[Samba] Tall tale of woe....

Ladner, Eric (Eric.Ladner) Eric.Ladner at chevrontexaco.com
Tue Dec 9 15:44:08 GMT 2003

Next time it happens, running an strace on the offending process "strace
-p <process_id>" can provide some insight as to what it's beating around
on, especially if it's system related.  That might help pinpoint a spot
in the code where it's having problems.


For the last year or so i have been having problems in general with
(various versions) on the same box. 
Dell 2500 Xeon 1.8 with 2gb of ram running Redhat 8.

What will happen from time to time (although its now happened 3 times in

the last 5 days, hence this email) is people will be slow to log in, if
all. Several things appear to happen.

The main one is that a smbd process which belongs to a user logging in 
will appear in top (a cpu monitor program) using massives amount of CPU 
etc. although the system says it still has about 10-15% idle, this 
generally stops everyone logging in.

Now as part of top on RH (doesnt look the same on bsd) it has a system 
entry with a % of cpu given over to that. Now system basically means 
anything I/O or kernal related. since the kernal governs resources this 
isnt uncommon. During a period of 4 hours i monitored this "system" and
never went above 10% and even then for a matter of seconds. When this
problem occours it pushes system upto 50-80%!!! i look at the 
server and the disks are pretty much idle so its not Disk Related. i am
a loss to find out what it is actually doing to cause this.

however once i kill off this process it seems to slowly get back to 

Now i have read other peoples emails and gone through the archives about

this and read about "failure for 4. Error = No route to host", 
"lib/util_sock.c:read_data(436)" and "oplocking"
problems as they all appear to be more pronounced around the time of 
this high CPU/rouge smbd process. 

However it would seem a lot of the oplocking problems seem to be 
hardware related. I use decent 3com kit here with a 4950 as a core and 
4400's at edge (i.e not cheap and cheerful netgear/dlink/etc stuff) so
wondering if anyone else has had these problems with this kit. or if its

not the kit what can it actually be?

ive tried turning oplocks on and off to no avail. it still has this

any ideas on the "read_data(436)" and "failure for 4. Error = No route
host" ?

Any help offered very gratefully recieved.

With thanks

Ross McInnes

