[Samba] Samba server, works fine for several days, then load increases indefinately till server unavailable

Tue Apr 22 11:54:01 GMT 2008

Volker Lendecke wrote:
> On Mon, Apr 21, 2008 at 09:13:28AM -0500, James A. Dinkel wrote:
>   
>> Anyway, the server will be fine and snappy for a week or so, then out of
>> the blue, nobody can connect.  Top shows a few smbd processes maxing out
>> the cpu and the load (which is usually < 1.0) gradually climbs up to 10,
>>     
>
> I've seen this only when something like connections.tdb
> became corrupt. With CentOS this is not likely, but reiserfs
> did that to me fairly often. What filesystem are your tdbs
> residing on? Maybe some other kernel-level problem like a
> problematic driver in the path to the hard disk?
>
> Volker
>   
I have seen this once on a CentOS-4.5-x86_64 box; IIRC, there was an 
issue with the Intel e1000 kernel module that caused a high number of 
connection resets,
but the RSTs never made it back, so the connections would just time out 
while the client started a new connection.  Then again, this box was 
using reiserfs to hold the tdbs, and it might have just been a fsck on 
reboot that fixed it when I rebooted after applying the kernel module 
update... anyways, what I was seeing was a consistently high number 
(several hundred) of queued packets for the sendQ across a dozen or so 
connections, and groups of reset connections all happening at the same 
time.  The load went up slowly for about a day, and then rocketed to 
well over 100 when a client was reset with a stuck locked file. 

FWIW, this was a SMP Xeon box w/ integrated Intel E1000s and the 
(mostly) stock 2.6.9-12(?) RHEL kernel.  I had found that Intel did have 
a patch for an issue very similar to what I was seeing, and after 
applying it, everything was happy again.