[Samba] Recurrent DNS issues after DC loss

Ole Traupe ole.traupe at tu-berlin.de
Wed Jun 6 08:54:34 UTC 2018



On 05.06.2018 20:39, lingpanda101 wrote:
> On 6/5/2018 2:11 PM, Ole Traupe via samba wrote:
>> Hi list,
>>
>> I have a domain in production on two sites (subnets, via "Sites and 
>> Services") with originally two DCs. One went down due to HDD (-> old 
>> hardware) error. Now, occasionally, clients cant access/find the file 
>> server (domain member). This does not occur on all clients at the 
>> same time, however, so I am rather sure it is not the file server 
>> itself, but a DNS problem.
>>
>> I couldn't find anything diagnostic in the logs. Default log level 
>> was not informative, I think, while log level 10 I just could not 
>> handle/analyze properly.
>>
>> Can someone recommend a log level? Should I look on the DC or on the 
>> file server?
>>
>> Do I have to remove the offline DC completely from DNS and Sites and 
>> Services for this mess to stop?
>>
>> I appreciate any advice.
>>
>> Cheers,
>> Ole
>>
>>
>>
> Ole,
>
>     If you haven't already removed the dead DC from your network you 
> should do that first.
>
> https://wiki.samba.org/index.php/Demoting_a_Samba_AD_DC
>
> Your clients DNS may still be pointing to the offline DC causing look 
> up delays. Also did you have your DC's pointing to themselves for DNS 
> or each other?
>

Thank you for your help!

I had trouble with fail-safe tests regarding DC redundancy a while ago. 
Some time after discussing it here on the list I finally got it working 
(had something to do with IPv6). So I can say I have tested the absence 
of a DC, and it did not lead to any trouble (except for a very short 
moment due to DNS caching, supposedly). Now it does, which is weird.

When the drive errors on the now broken DC manifested, the domain acted 
weirdly. When I took that DC completely offline, everything went back to 
normal. Now issues are showing up. Just so much for the background.

The current situation is very much like in the fail-safe tests, with two 
exceptions: the remaining DC (FSMO role holder) is the primary DNS 
server on all Windows machines, and I updated the resolv.conf on that DC 
to only point to itself. This DC and several Windows clients got 
restarted after that, but issues persist.

Actually, the DCs (resolv.conf) were pointing to each other initially, 
and I think that was at least one root of the evil. I think this advice 
in the Samba wiki actually is rather bad (and unnecessary with Samba, as 
has been pointed out, before?).

Regarding demoting the dead DC: My Samba version is rather old (4.2.5). 
The problem is that I chose the uid/gid scopes unwisely. And I read on 
some patch notes that I can't update anymore, because newer versions of 
Samba actually require those scopes to be set in a very specific way. So 
perhaps demoting via the newly available method is not an option here.

What I can think of is:
- removing the dead DC from the clients DNS config, of course
- removing it from AD DNS
- removing it from AD Sites and Services
- and removing it from AD Users and Computers

What else does the Samba script for demoting a DC do? Can I do that 
manually, too? I repeat: it was not the FSMO role holder.

Thanks again for any advice!
Ole





More information about the samba mailing list