[Samba] winbind causes Linux to lockup when connectivity to AD is lost (subject line edited for clarity)
jra at samba.org
Thu Oct 22 17:17:20 MDT 2009
On Fri, Oct 23, 2009 at 12:13:22PM +1300, Jason Haar wrote:
> On 10/23/2009 11:45 AM, Robert LeBlanc wrote:
> > I'm using 3.4.2 right now and I'm seeing a similar problem. We are
> > using winbind to authenticate our users on our Linux cluster. The
> > worker and interactive nodes are on a private subnet that is NATed to
> > the local LAN. Two head nodes provide failover for the NATing. When
> > failover is happening, winbind whacks out. The system is not unusable,
> > but no authentication happens for about 30 minutes after the failover.
> > I'm going to see if I can get iptables to share state between machines
> > to help prevent this, but there needs to be a faster reconnection
> > after domain controllers seem to be down.
> What I see (as a winbind-laptop user) is that sometimes winbind thinks
> it has working connections to domain controllers when either the network
> is down or is no longer the corporate network. e.g. I can be logged in
> at work, sleep my laptop and take it home. After coming out of sleep,
> "netstat -t" shows that there are still ESTABLISHED tcp sessions to
> domain controllers - even though my home network has no access to my
> work network. I think winbind then gets into a state where it is
> continually trying to talk to these non-available domain controllers and
> it never gives up - and so the offline mode never kicks in.
> It's got so bad that I now have scripts that run whenever a network
> change occurs, to check if winbind is "stuck" and restart accordingly.
Hmmm. If netstat -t shows an established TCP connection then
that's active in the kernel. winbindd will then use that
connection (as it think's it's ok).
It should correctly time out (20 - 30 seconds) and then
tear down and re-establish if the DC isn't responding.
Can you post debug level 10 logs from winbindd in this
state to your bug report (apologies if you've already done
so, I've been triaging 3.4.3 blocker bugs this week).
More information about the samba