Winbindd using 100% of CPU. Any solution?

Richard Sharpe realrichardsharpe at gmail.com
Sat Jan 4 12:04:41 MST 2014


On Fri, Jan 3, 2014 at 5:28 PM, Richard Sharpe
<realrichardsharpe at gmail.com> wrote:
> On Fri, Jan 3, 2014 at 10:14 AM, Richard Sharpe
> <realrichardsharpe at gmail.com> wrote:
>> On Thu, Dec 19, 2013 at 7:59 AM, Richard Sharpe
>> <realrichardsharpe at gmail.com> wrote:
>>> On Wed, Dec 18, 2013 at 10:57 AM, Richard Sharpe
>>> <realrichardsharpe at gmail.com> wrote:
>>>> On Wed, Dec 18, 2013 at 10:40 AM, Richard Sharpe
>>>
>>> Some time yesterday the customer changed something so that the machine
>>> I was working with could only see three domains:
>>>
>>> 1. The one we were joined to, and
>>> 2. BUILTIN, and
>>> 3. The machine/local domain.
>>>
>>> Now the problem does not occur.
>>>
>>> Prior to that we could see 33 domains.
>>>
>>> We have another customer with 90+ domains who only sees the problem if
>>> they use MMC to modify share permissions.
>>>
>>> We are now trying to repro the problem in-house.
>>>
>>> Is there some way to prevent a joined member server from seeing all the domains?
>>
>> Well, I am back on this. I started investigating this:
>>
>> [2014/01/03 12:50:07.363167, 10]
>> winbindd/winbindd_cache.c:4561(wcache_tdc_add_domain)
>>   wcache_tdc_add_domain: Adding domain OIAA (), SID S-1-0-0, flags =
>> 0x20, attributes =
>> 0x1000000, type = 0x1
>>
>> That is, why are we seeing that SID of S-1-0-0? So, I added a panic
>> and now I have a core file, and that is due to this code in
>> winbind_ads.c:
>>
>>                 /* add to the trusted domain cache */
>>
>>                 fstrcpy(d.name, trust->netbios_name);
>>                 fstrcpy(d.alt_name, trust->dns_name);
>>                 if (trust->sid) {
>>                         sid_copy(&d.sid, trust->sid);
>>                 } else {
>>                         sid_copy(&d.sid, &global_sid_NULL);
>>                 }
>>
>> because the list of trusted domains we are getting contains lots of these:
>>
>> $4 = {netbios_name = 0x8033593e0 "OIAA", dns_name = 0x0, trust_flags = 32,
>>   parent_index = 0, trust_type = NETR_TRUST_TYPE_DOWNLEVEL,
>>   trust_attributes = 16777216, sid = 0x0, guid = {time_low = 0, time_mid = 0,
>>     time_hi_and_version = 0, clock_seq = "\000", node = "\000\000\000\000\000"}}
>> (gdb) p trusts->array[1]
>> $5 = {netbios_name = 0x803359680 "yyyy", dns_name = 0x0, trust_flags = 32,
>>   parent_index = 0, trust_type = NETR_TRUST_TYPE_DOWNLEVEL,
>>   trust_attributes = 16777216, sid = 0x0, guid = {time_low = 0, time_mid = 0,
>>     time_hi_and_version = 0, clock_seq = "\000", node = "\000\000\000\000\000"}}
>> (gdb) p trusts->array[2]
>> $6 = {netbios_name = 0x8033bf310 "xxxxxxxx", dns_name = 0x0, trust_flags = 32,
>>   parent_index = 0, trust_type = NETR_TRUST_TYPE_DOWNLEVEL,
>>   trust_attributes = 16777216, sid = 0x0, guid = {time_low = 0, time_mid = 0,
>>     time_hi_and_version = 0, clock_seq = "\000", node = "\000\000\000\000\000"}}
>> (gdb) p trusts->array[3]
>>
>> where we are not getting a SID or DNS name etc.
>>
>> So, the corruption is not coming in at that point.
>>
>> The next thing to look at is why those two domains are causing
>> problems. I am told that the customer is doing Exchange mailbox
>> migration.
>
> This seems to be the relevant info:
>
> in log.wb-EXCHANGE I find this:
>
> [2014/01/03 19:42:36.705779, 10]
> winbindd/winbindd_cache.c:4561(wcache_tdc_add_domain)
>   wcache_tdc_add_domain: Adding domain XCHANGE (xchange.some.dom), SID
> S-1-5-21-78225239
> 9-1160315966-1364796038, flags = 0x4, attributes = 0x0, type = 0x0
>
> and in the winbindd.log I find this:
>
> [2014/01/03 19:42:36.139768, 10]
> winbindd/winbindd_cache.c:4561(wcache_tdc_add_domain)
>   wcache_tdc_add_domain: Adding domain EXCHANGE (xchange.some.dom),
> SID S-1-5-21-7822523
> 99-1160315966-1364796038, flags = 0x0, attributes = 0x0, type = 0x0
>
> The SIDs are the same. The domain XCHANGE seems to be
>
> #define NETR_TRUST_ATTRIBUTE_QUARANTINED_DOMAIN ( 0x00000004 )
>
> or maybe:
>
> #define NETR_TRUST_FLAG_TREEROOT ( 0x00000004 )
>
> So, I wonder if we should process it at all?

So many red herrings.

Here is the problem in my case.

For some reason, in this customer's case, they have a domain called
EXCHANGE and one called XCHANGE, but both seem to have the same DNS
name (xchange.some.dom). One of them seems permanently offline as
well, but that does not matter here.

When we get the list of trusted domains, some times, we already have
one of them, EXCHANGE, and we receive an entry for XCHANGE (I think it
happens in that order.) We search for the domain in
rescan_forest_trusts, but the search routine doesn't find it. However,
add_trusted_domain does find the existing one because it also compares
the alt_name (dns_name passed in) and returns the other entry. We then
call setup_domain_child on that domain, which calls setup_child.

In setup_child we do:

        child->sock = -1;
        child->domain = domain;

which then causes us to call fork_domain_child in
wb_child_request_trigger and bang, we insert the same entry again and
corrupt the list.

I am going to prevent the call to setup_domain_child if the name
passed in does not match the name we found to see if I can prevent
this crash.

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)


More information about the samba-technical mailing list