[Samba] second dc not working properly

Jason Keltz jas at eecs.yorku.ca
Tue Dec 8 23:20:01 UTC 2020

On 12/8/2020 4:35 PM, Rowland penny via samba wrote:
> On 08/12/2020 21:09, Jason Keltz via samba wrote:
>> I'm running Samba 4.11.16 on CentOS 7 and not having much luck with 
>> failover to a second domain controller.  I could *really* use some help.
>> I know my Samba config is fine.  I know that adding the second domain 
>> controler was fine.  Replication is working perfectly. No errors.   
>> If I stop the DC processes on either server, Windows clients appear 
>> to failover perfectly fine.
>> The problem seems to affect my Linux clients (CentOS 7) running winbind.
>> Let's say a CentOS 7 client X is connected to dc2, and I stop the DC 
>> processes on dc2....  The odd time, the client will connect to dc1 
>> almost right away, and everything just works the way it should always 
>> work.
>> However, most of the time, I stop the DC processes on dc2, the client 
>> will connect to dc1, I can even do a "wbinfo -u" or "wbinfo -g", but 
>> "whoami" reveals "user doesn't exist". Somewhere between 20-50 
>> minutes later, it just "magically" works.  The timing doesn't seem 
>> consistent.  Even a reboot doesn't fix things when it's in this state.
>> I've tried to follow the Samba logs, but I really can't figure out 
>> what's up.  Andrew? Jeremy? Anyone?
>> I don't think this can be just my system.  I suspect there's a lot of 
>> users out there running multiple DCs with a similar setup to me, 
>> believing that it's all working, and maybe, because there hasn't been 
>> a failure, everything works great, but who knows what will happen 
>> when there's actually a failure.
>> Jason.
> Try adding these lines to the /etc/resolv.conf on the Linux clients:
> options rotate
> options timeout:1
> ||Rowland 

Hi Rowland,

Thanks for your message! Unfortunately, this didn't work.

Here's something that may help jog your memory if you've heard of this 
happening before.....

So my machine was connected to dc2... I stopped DC services on dc2, and 
sure enough, I see the connection host->dc1:microsoft-ds, and 
host->dc2:ldap ... perfect! buuuttt I still get "user jas does not 
exist".  wbinfo -u is giving me nothing now, yet wbinfo -g is working 
fine.  I checked back in a few mins, and now "wbinfo -u" is giving me 
the full user list.  I'm still an "unknown user" though, and calls to 
"getent passwd jas" or "getent passwd <any user>" fail even though calls 
to "getent group <any group>" all work.  There *is* a connection.  
Eventually, it will realize that I exist without anything changing.  I 
highly suspect it's some kind of cache that needs to timeout... some 
kind of cache that doesn't get reset if winbind is just restarted.  You 
know I've got the right nsswitch.conf, but here it is ...

passwd:     files winbind
shadow:     files
group:      files winbind

... and I know I've got all the proper links as well (or things wouldn't 
magically start working some time later).

This sure has me puzzled.


More information about the samba mailing list