[Samba] sudden intermittent (but predictable) logon & connection failures

Ben Walton bwalton at artsci.utoronto.ca
Mon Mar 27 20:59:39 GMT 2006

Hi List,

This one has me completely stumped.  I hope that someone out there can

Setup: 1 PDC (Samba 3.0.10-1.4E.2 from redhat as 4) also doing WINS. [I
know there's an update, but it doesn't help.  I'm trying to keep as many
variables static for troubleshooting as possible.]

On March 8, my server rebooted mysteriously overnight (the only info I
have is from the BMC, which indicates cpu shutdowns and then later power
loss in both power supplies).  Up until this point, samba was working
flawlessly as a PDC (non-production at this point).  I'm running Redhat
AS 4 with mostly current patches.  I run an unattended
(unattended.sourceforge.net) install from this machine (initiated via
PXE, hosted on the same machine) to image workstations.

Since this reboot, the server has performed perfectly except that samba
now 'fails' intermittently.  The failures that I see manifest 2 ways:
1.  During the unattended install, I get 'File or resource not found'
errors from my XPsp2 clients.  Using unattended's "retry" feature allows
a reconnection that then works until the end of install.  This process
was completely hands free and working without error before the server

When a connection 'dies', the smbd is still running on the server, but
falls back to root credentials.  Subsequent connections spawn a new pid.
I can see in the logs that credentials are supplied automatically from
the clients cached values.

2.  After a fresh imaging, I cannot perform a domain logon until I've
logged in locally.  (I do nothing more than log in and then back out.)
Subsequent reboots of the machine will allow a working domain logon if I
wait for ~30 seconds before attempting.  If I try before that, I either
get a <domain> not found message (first logon for this user) or a cached
session/profile if the user had logged in previously.  (My policies
trigger messages about \Desktop being unavailable, etc.).  When working
with a cached logon, I can simply hit F5 to get my desktop icons back
from the samba server (logs show cached credentials being supplied)...

The event logs on the XP box show a NETLOGON:5719 error that indicates
the RPC server cannot be found when logons fail (or allow a cached

Google hasn't turned up anything helpful (lots of interesting things) so
far.  All of the RPC Server searches I've done lead me down roads that
haven't helped at all.  I don't think it's a WINS/DNS issue as the setup
does still (mostly) work.

I thought I had a bad NIC in my box, so I switched to the alternate
(moved all IP settings, etc) and things seemed to work well for the
better part of a week.  After the weekend (this is now the 20th), the
problem reoccured.  Logons fail after a boot, imaging fails sometimes.
I still feel like I'm fighting bad hardware, but can find no indication
of this.  All other services on the box are fine, the machine itself has
run properly since the incident, etc.

Since then I've been poring through logs, sniffing packets (I even have
the machines on a hub right now for easier sniffing), poking various
settings, etc.  Nothing seems to resolve this.  I've now wiped out
(after grabbing a backup) my passdb.tdb, secrets.tdb and all files
under /var/cache/samba.  No luck.

I can post smb.conf if anyone thinks it might help, but it hasn't
changed (with the exception of logging values).  I can post any other
info that may help too (sniffing logs?)...Anything that might help
eliminate samba from the problem scenario and point me in a better
(hardware?) direction would be of benefit too. 

Ben Walton
Systems Programmer
Office of Planning & IT
Faculty of Arts & Science
University of Toronto
Cell: 416.407.5610
PGP Key Id: 8E89F6D2
