[Samba] smbd mortality

Jeremy Allison jra at samba.org
Thu Nov 13 20:23:21 GMT 2008


On Thu, Nov 13, 2008 at 03:12:35PM -0500, kvkamin at aim.com wrote:
> 
> I go to the subdirectory, via linux console, where the suspect file is located and ls the directory.? 9 files.? ls -al gets Killed. After ls -al filename for each of the 9 files, I determine that 5 of these files are badly corrupt.? I perform an experiment.? Tell everyone to leave these files alone, reboot the server and it runs happily for an hour.? Load is .05 average.? I ask one user to attempt to open one of the corrupt files, and instantly all 50 smbd daemons go to uninterruptible sleep and every WinXP client instantly re-establishes its smbd session with the server and these (all 50) smbd sessions also die and go to heaven.? This cycle continues rapidly sending the load sky high with no cpu utilization to speak of. 

Uninterruptible sleep == kernel problem.

> Questions that remain:
> 1.? Why do all client smbd daemons have to die if only one of them ran into trouble?

Once you have processes going into an uninterruptible state the system
is dead. It might not have stopped moving yet, but it's dead. You
have a kernel/filesystem issue you need to resolve. My guess is a
bad disk.

> 2.? How do files get in a state that they can't be viewed or managed?? virus, lack of sunspots?

Bad disk, probably.

> 3.? Why did the fsck say that the filesystem was fine, when obviously it isn't?

Kernel bug ?

> 4.? How to delete these poison files?

Backup the filesystem without them, reformat, restore. Did
you have hard disk hardware error reporting turned on ?

It's not reasonable to expect smbd to survive errors of
this magnitude I'm afraid. Once processes start going into
a uninterruptible state there's no way for user space code
to recover.

Jeremy.


More information about the samba mailing list