[Samba] winbind causes Linux to lockup when connectivity to AD is lost (subject line edited for clarity)

Matthew J. Salerno vagabond_king at yahoo.com
Mon Oct 19 11:47:02 MDT 2009






________________________________
From: Clayton Hill <admin at ateamonsite.com>
To: Matthew J. Salerno <Vagabond_king at yahoo.com>
Cc: samba at lists.samba.org
Sent: Mon, October 19, 2009 1:20:00 PM
Subject: Re: [Samba] winbind causes Linux to lockup when connectivity to AD is lost (subject line edited for clarity)

Hi Matthew,


>I don't have the time to setup an environment to match yours, but I did take the time to go back to your initial post and read through your >smb.conf.

Understandable, but that is not going to be of much help if you don't have a way to reproduce this issue.. and I'll be answering too many basic questions. ;-)


> 1. http://samba.org/samba/docs/man/manpages-3/winbindd.8.html - Did you check your winbind config to make sure you are not running it with a "-n" ?
>

Yes. I am using the default init script to start and stop winbind. Remember I am using suse 11.0 x86_64  
BUT I have tested this without -n which is a totally useless way to run winbind and ironically should be far worse usability-wise than this scenario - but isn't.




> 2. http://samba.org/samba/docs/man/manpages-3/smb.conf.5.html - Have you tried playing with the "winbind cache time", "winbind offline logon", "winbind reconnect delay" and "idmap cache time" settings?
>

I will reread those options in the man page, but.... what do you recommend here? Feels like a shot in the dark, and a lengthy way to randomly test this. IE: This test renders a samba machine useless every time it is ran... so very long, slow, shots in the dark here.
Need some experienced expert advice here on which options are best to modify and why.




> 3. Have you tried increasing the log level and enabling winbind debug and creating an artificial outage and then review the logs?

Yes - I will give you a snippet of log level 2 though during a "fake AD outage" in a bit. I doubt it will be useful but I'll try it.


 
> Again, what kind of troubleshooting have you done and what are the results?

Please- try and reproduce this issue. It will become quite obvious to you after that. 


Thanks,
-Clayton



Matthew J. Salerno wrote: 
----- Original Message ----
>From: Clayton Hill <admin at ateamonsite.com>
>To: Matthew J. Salerno <Vagabond_king at yahoo.com>
>Cc: samba at lists.samba.org; Jeremy Allison <jra at samba.org>
>Sent: Sun, October 18, 2009 7:49:01 PM
>Subject: Re: [Samba] winbind causes Linux to lockup when connectivity to AD is lost (subject line edited for clarity)
>
>Thanks for confirming my config is good. I already know about the old 
>problem with SSH and reverse DNS lookups. That actually takes about 5 
>minutes or less to log in, with this issue be prepared to wait almost an 
>hour if it even works. Similar but not the same issue.
>Please, to get an understanding of this problem do the following steps 
>to reproduce this problem.
>
>SUSE 11.0
>Samba 3.2
>Join windows 2003 AD domain (with 40,000 objects) using      net ads join
>Take domain controller offline.
>
>Try to log in LOCALLY as ROOT to your console on your domain member 
>linux box. Do not even bother to log in as any samba user of do ANYTHING 
>samba related.
>Watch as it takes more time than bearable (I am talking MORE THAN 20 
>minutes!) to0 log in to the LOCAL TERMINAL
>attempt to do the same with ssh
>if you are already logged in before you do this test as root LOCALLY TTY 
>then try and run simple commands such as:  top,ls,ps,man etc etc
>
>After seeing the problem clearly simply do this to become unstuck:
>killall winbindd
>or
>service winbind stop
>
>
>have a lot of fun.
>
>Cheers,
>-Clayton
>
>
>
>
>
>
>Matthew J. Salerno wrote:
>  
>Your  /etc/nsswitch.conf looks correct to me.  For services like ssh, you should just disable ptr lookups (VerifyReverseMapping no).  Regarding winbind, do you have any services or processes running on the box as a domain user?  Perhaps there is a timeout setting for krb and winbind.  I don't recall seeing one for winbind, but I would imagine that there is one for kerberos.  Have you bumped up the debugging and purposefully caused an ad failure (ifdown or bad route) ?  Have you had the console open and watched top to see if it's a processes consuming to much cpu?  What kind of troubleshooting have you done?  and what are the results?
>>
>>
>>
>>----- Original Message ----
>>From: "admin at ateamonsite.com" <admin at ateamonsite.com>
>>To: admin at ateamonsite.com
>>Cc: samba at lists.samba.org; Jeremy Allison <jra at samba.org>
>>Sent: Fri, October 16, 2009 3:59:45 PM
>>Subject: Re: [Samba] winbind causes Linux to lockup when connectivity to AD is lost (subject line edited for clarity)
>>
>>
>>Ok I am not hearing replies back - I dont want this issue to be swept under
>>the rug. 
>>
>>
>>It has been a issue for me since SuSE 10.1 + samba-3.0.30-0.1.112 even..
>>I know now that the commands I was telling you all access UN/PW info such
>>as LS or MAN etc, to see if you have permission to run them? IDK I am
>>guessing.
>>
>>BUT - if winbind is really caching and the connection is lost, then this
>>should be a non-issue as you say.
>>
>>Well here is my nsswitch.conf:
>>
>>
>>cat /etc/nsswitch.conf
>>
>>
>>passwd: compat winbind
>>group:  compat winbind
>>
>>networks:      files dns
>>
>>services:      files
>>protocols:      files
>>rpc:    files
>>ethers: files
>>netmasks:      files
>>netgroup:      files
>>publickey:      files
>>
>>bootparams:    files
>>automount:      files
>>aliases:        files
>>
>>hosts:  files dns
>>shadow: compat
>>
>>
>>Isn't this set up right? ;-)
>>
>>
>>So, famously when DNS is down, crap like SSH and NFS take unreasonable
>>amounts of time and cause system hangs in linux. This is what I've been
>>told, and I can accept that.
>>Since DNS is hosted on the AD server, when that server goes down, SSH, and
>>even local login hang for extremely long amounts of time - im talking more
>>than 10 minutes... then fail.
>>
>>In Windows (im sorry Im about to compare 2 operating systems) this is a non
>>issue and you can use the machine even if the networking is hosed or you
>>cant talk to the AD.
>>
>>So.......
>>
>>BUMP! :-)
>>
>>
>>
>>
>>
>>On Wed, 14 Oct 2009 16:51:10 -0600, <admin at ateamonsite.com> wrote:
>>  
>>    
>>Hopefully that isn't a bad thing! haha 
>>>Thanks! 
>>>
>>>
>>>On Wed, 14 Oct 2009 15:44:54 -0700, Jeremy Allison <jra at samba.org> wrote:
>>>    
>>>      
>>>On Wed, Oct 14, 2009 at 04:02:41PM -0600, admin at ateamonsite.com wrote:
>>>>      
>>>>        
>>>>Hi Jeremy,
>>>>>
>>>>>
>>>>>        
>>>>>          
>>>>>Sorry, didn't look too closely at your winbindd issue.
>>>>>>winbindd will cache all information to allow disconnected
>>>>>>operation (we made this work perfectly at SuSE), so there
>>>>>>certainly shouldn't be a problem with a loss of connection to a DC.
>>>>>>          
>>>>>>            
>>>>>I am sorry to report that I am in fact using SuSE, and this problem is
>>>>>very
>>>>>easy to reproduce if I power off my AD domain, then wait (I guess) 10
>>>>>minutes - then try and ssh to my Linux box. There is no way to log into
>>>>>the
>>>>>box. 
>>>>>        
>>>>>          
>>>>Ok, then I'm going to hand you over to the SuSE Samba Team
>>>>maintainers on this list (sorry :-).
>>>>
>>>>Jeremy.
>>>>      
>>>>        
>I don't have the time to setup an environment to match yours, but I did take the time to go back to your initial post and read through your smb.conf.
>
>1. http://samba.org/samba/docs/man/manpages-3/winbindd.8.html - Did you check your winbind config to make sure you are not running it with a "-n" ?
>2. http://samba.org/samba/docs/man/manpages-3/smb.conf.5.html - Have you tried playing with the "winbind cache time", "winbind offline logon", "winbind reconnect delay" and "idmap cache time" settings?
>3. Have you tried increasing the log level and enabling winbind debug and creating an artificial outage and then review the logs?
> 
>Again, what kind of troubleshooting have you done and what are the results?
>
>
>      
>  
Please understand that I am not a samba dev, I am just an average user who is willing to help others out when I can because I know how much it sucks to be stuck.  I do not have the time to mirror your environment.  Regarding the settings I recommended in my last post, I'm not sure what the best settings would be for them, but since they all deal with caching info from AD I figured that they might be usefull.  Honestly, I would set them all to cache for a very long time, simulate outtage, adjust and repeat.

Have you checked on any suse forums?  If it is a suse issue, chances are that you are not the only person having this problem.  I'll try the outage out in my Redhat env.


      


More information about the samba mailing list