[Samba] winbind causes Linux to lockup when connectivity to AD is lost (subject line edited for clarity)
Robert LeBlanc
robert at leblancnet.us
Fri Oct 23 13:17:55 MDT 2009
I also see this in the syslog sometimes:
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.132286] rsync invoked oom-killer:
gfp_mask=0x201d2, order=0, oomkilladj=0
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.132649] Pid: 6516, comm: rsync Not
tainted 2.6.26-2-amd64 #1
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.132916]
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.132917] Call Trace:
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.133470] [<ffffffff802738c0>]
oom_kill_process+0x57/0x1dc
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.133746] [<ffffffff8023b551>]
__capable+0x9/0x1c
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.133993] [<ffffffff80273beb>]
badness+0x188/0x1c7
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.134245] [<ffffffff80273e1f>]
out_of_memory+0x1f5/0x28e
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.140836] [<ffffffff80276b70>]
__alloc_pages_internal+0x31d/0x3bf
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.141048] [<ffffffff80272d1c>]
generic_file_aio_read+0x3b7/0x4ae
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.141279] [<ffffffff8029ae47>]
do_sync_read+0xc9/0x10c
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.141472] [<ffffffff80246221>]
autoremove_wake_function+0x0/0x2e
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.141682] [<ffffffff8029b638>]
vfs_read+0xaa/0x152
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.141864] [<ffffffff8029ba19>]
sys_read+0x45/0x6e
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142046] [<ffffffff8020beca>]
system_call_after_swapgs+0x8a/0x8f
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142254]
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142376] Mem-info:
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142511] Node 0 DMA per-cpu:
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142662] CPU 0: hi: 0,
btch: 1 usd: 0
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142844] Node 0 DMA32 per-cpu:
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.142998] CPU 0: hi: 186, btch:
31 usd: 173
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.143183] Active:189862
inactive:179626 dirty:0 writeback:0 unstable:0
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.143184] free:3011 slab:7697
mapped:76 pagetables:1122 bounce:0
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.143592] Node 0 DMA free:6020kB
min:32kB low:40kB high:48kB active:3012kB inactive:2676kB present:10724kB
pages_scanned:9007 all_unreclaimable? yes
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.144711] lowmem_reserve[]: 0 1499
1499 1499
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.144894] Node 0 DMA32 free:6024kB
min:4936kB low:6168kB high:7404kB active:756436kB inactive:715828kB
present:1535136kB pages_scanned:626785 all_unreclaimable? no
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.145479] lowmem_reserve[]: 0 0 0 0
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.145648] Node 0 DMA: 3*4kB 1*8kB
1*16kB 5*32kB 3*64kB 2*128kB 3*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB =
6020kB
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.146045] Node 0 DMA32: 162*4kB
28*8kB 9*16kB 7*32kB 1*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB
1*4096kB = 6040kB
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.155603] 364394 total pagecache
pages
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.155831] Swap cache: add 0, delete
0, find 0/0
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.156064] Free swap = 0kB
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.156064] Total swap = 0kB
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.164049] 393200 pages of RAM
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.164049] 6902 reserved pages
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.164049] 2124 pages shared
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.164247] 0 pages swap cached
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.164396] Out of memory: kill
process 5842 (winbindd) score 76798 or a child
Oct 23 13:09:35 lsbeast-i2 kernel: [74133.164850] Killed process 5847
(winbindd)
Looks like winbind is running out of memory?
Robert LeBlanc
Life Sciences & Undergraduate Education Computer Support
Brigham Young University
On Fri, Oct 23, 2009 at 9:33 AM, Robert LeBlanc <robert at leblancnet.us>wrote:
> Just out of curiosity, do any of you have mdns4_minimal or mdsn4 in your
> /etc/nsswitch.conf file? I think mdns4 doesn't work too well and I usually
> take it out, but it was alive and well on these machines. Does removing
> those items help anyone?
>
> Robert LeBlanc
> Life Sciences & Undergraduate Education Computer Support
> Brigham Young University
>
>
> On Thu, Oct 22, 2009 at 4:45 PM, Robert LeBlanc <robert at leblancnet.us>wrote:
>
>> I'm using 3.4.2 right now and I'm seeing a similar problem. We are using
>> winbind to authenticate our users on our Linux cluster. The worker and
>> interactive nodes are on a private subnet that is NATed to the local LAN.
>> Two head nodes provide failover for the NATing. When failover is happening,
>> winbind whacks out. The system is not unusable, but no authentication
>> happens for about 30 minutes after the failover. I'm going to see if I can
>> get iptables to share state between machines to help prevent this, but there
>> needs to be a faster reconnection after domain controllers seem to be down.
>>
>> Robert LeBlanc
>> Life Sciences & Undergraduate Education Computer Support
>> Brigham Young University
>>
>>
>>
>> On Thu, Oct 22, 2009 at 1:55 AM, Clayton Hill <admin at ateamonsite.com>wrote:
>>
>>> Hi Jason,
>>>
>>> Yup you got the same problem - just going about it a sorta different way
>>> - ouch that must really suck having winbind\ADdomain own the account you
>>> are logged in as. bummer!
>>> My problem is slightly less serious as I am trying to use my local
>>> accounts (such as root) and I just use samba as a domain member to host
>>> files with AD ACLs in the filesystem permissions... but we see the same bug.
>>> because winbind (even caching) kills access to my local accounts.
>>> I hope this is fixed in 3.4 (I just installed it yesterday) I haven't had
>>> a chance to run the same test on 3.4
>>>
>>> possibilities:
>>> winbind is not caching right to allow smooth operation when the DC is
>>> offline and the system is virtually locked up
>>> winbind doesnt know the moment it cant connect to the DC that it should
>>> really use cache or just buzz off and die somehow
>>> winbind may or may not connect back up to the DC immediately
>>>
>>> I need to play with parameters and see what the new winbind options in
>>> 3.4 do. I have been on 3.2 until yesterday.
>>>
>>>
>>> Thanks for the info on the bug report..
>>>
>>> Cheers,
>>> -Clayton
>>>
>>> Jason Haar wrote:
>>>
>>>> Just a FYI, but this looks an awful lot like the bug I reported months
>>>> ago
>>>>
>>>> https://bugzilla.samba.org/show_bug.cgi?id=6103
>>>>
>>>> Basically I'm running Fedora11 with no local accounts (beyond root) -
>>>> relying on winbind. On occasion winbind appears to "hang" - and no local
>>>> access works - including root - which shouldn't need winbind to succeed!
>>>> Normally I have to reboot to fix, however if I was lucky enough for it
>>>> to happen before my screensaver kicked in, then simply restarting
>>>> winbind fixes the problem.
>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> To unsubscribe from this list go to the following URL and read the
>>> instructions: https://lists.samba.org/mailman/options/samba
>>>
>>
>>
>
More information about the samba
mailing list