[Samba] second dc not working properly
jas at eecs.yorku.ca
Tue Dec 8 23:20:01 UTC 2020
On 12/8/2020 4:35 PM, Rowland penny via samba wrote:
> On 08/12/2020 21:09, Jason Keltz via samba wrote:
>> I'm running Samba 4.11.16 on CentOS 7 and not having much luck with
>> failover to a second domain controller. I could *really* use some help.
>> I know my Samba config is fine. I know that adding the second domain
>> controler was fine. Replication is working perfectly. No errors.
>> If I stop the DC processes on either server, Windows clients appear
>> to failover perfectly fine.
>> The problem seems to affect my Linux clients (CentOS 7) running winbind.
>> Let's say a CentOS 7 client X is connected to dc2, and I stop the DC
>> processes on dc2.... The odd time, the client will connect to dc1
>> almost right away, and everything just works the way it should always
>> However, most of the time, I stop the DC processes on dc2, the client
>> will connect to dc1, I can even do a "wbinfo -u" or "wbinfo -g", but
>> "whoami" reveals "user doesn't exist". Somewhere between 20-50
>> minutes later, it just "magically" works. The timing doesn't seem
>> consistent. Even a reboot doesn't fix things when it's in this state.
>> I've tried to follow the Samba logs, but I really can't figure out
>> what's up. Andrew? Jeremy? Anyone?
>> I don't think this can be just my system. I suspect there's a lot of
>> users out there running multiple DCs with a similar setup to me,
>> believing that it's all working, and maybe, because there hasn't been
>> a failure, everything works great, but who knows what will happen
>> when there's actually a failure.
> Try adding these lines to the /etc/resolv.conf on the Linux clients:
> options rotate
> options timeout:1
Thanks for your message! Unfortunately, this didn't work.
Here's something that may help jog your memory if you've heard of this
So my machine was connected to dc2... I stopped DC services on dc2, and
sure enough, I see the connection host->dc1:microsoft-ds, and
host->dc2:ldap ... perfect! buuuttt I still get "user jas does not
exist". wbinfo -u is giving me nothing now, yet wbinfo -g is working
fine. I checked back in a few mins, and now "wbinfo -u" is giving me
the full user list. I'm still an "unknown user" though, and calls to
"getent passwd jas" or "getent passwd <any user>" fail even though calls
to "getent group <any group>" all work. There *is* a connection.
Eventually, it will realize that I exist without anything changing. I
highly suspect it's some kind of cache that needs to timeout... some
kind of cache that doesn't get reset if winbind is just restarted. You
know I've got the right nsswitch.conf, but here it is ...
passwd: files winbind
group: files winbind
... and I know I've got all the proper links as well (or things wouldn't
magically start working some time later).
This sure has me puzzled.
More information about the samba