[Samba] samba AD problem after re-join domain
jas at eecs.yorku.ca
Mon Oct 12 14:36:55 UTC 2020
On 10/12/2020 4:06 AM, Rowland penny via samba wrote:
> On 12/10/2020 02:54, Jason Keltz via samba wrote:
>> I've been working on a Samba AD setup with a bunch of test machines -
>> the one DC, and a bunch of clients. Last night, I ended up switching
>> the name of the test machines temporarily (except the DC), and
>> re-joining the domain (that's for another e-mail later). When things
>> didn't work the way I had planned, I switched the hostnames back,
>> and re-joined the domain today on all the test machines. I was
>> shocked to find that I am only able to login to the domain on one of
>> my hosts. It fails on all the other ones. I ensured that I deleted
>> the machine entries from AD. I haven't changed my Samba config in
>> months which Rowland had last verified was fine. I haven't changed
>> my /etc/krb5.conf Kerberos config in months. I even did a complete
>> rebuild of one of the machines since I automated the installation
>> process, and that rebuild was working perfectly many many times, but
>> now it is failed. In winbind log every time I try to login I'm
>> mostly seeing:
> Did you leave the domain before you changed the hostname ?
> Why did you change the hostnames ? In a case like this, I would have
> set up a new computer, joined this to the domain and then removed the
> old computer from the domain.
I did not leave the domain, but I did delete the entry by either the
Windows AD tool or "samba-tool computer delete" option. I can't
remember which one at this point. I think that clears up all the bits.
Is that correct? On the local host, I also deleted the
/etc/krb5.keytab, and deleted all the samba bits so that the join was fresh.
Things are better today. I discovered one issue which seemingly
unrelated (to me) to the errors seems to have been the cause of a lot of
the trouble. I was chasing errors in winbind log, but several of the
test servers are NFS servers, and when I rejoined them to the domain, I
didn't replace the nfs/X entries in their keytab. Now, the clients
couldn't mount, and that definately caused some trouble, for which I
didn't see the signs. I'm still watching though. However, I can login
to all the hosts now.
By the way, at one point, I rebooted the DC, and I noticed that all the
AD clients showed something like this:
[2020/10/12 09:25:19.183616, 1, pid=36145, effective(0, 0), real(0, 0)]
../../source3/rpc_client/cli_pipe.c:422: Bind NACK received from host
[2020/10/12 09:44:11.598150, 1, pid=36145, effective(0, 0), real(0, 0)]
Reducing LDAP page size from 1000 to 500 due to IO_TIMEOUT
(Which is strange because this means that if you reboot he DC, then the
clients start talking slower to it when it comes back up? I don't think
the number ever increases unless you restart winbind everywhere?)
and since that reboot, I've seen a few of them do this:
[2020/10/12 10:00:19.814381, 1, pid=36145, effective(0, 0), real(0, 0)]
Reducing LDAP page size from 500 to 250 due to IO_TIMEOUT
[2020/10/12 10:16:19.557261, 1, pid=36145, effective(0, 0), real(0, 0)]
Reducing LDAP page size from 250 to 125 due to IO_TIMEOUT
Two of them are virtualbox VMs, so I figured maybe it's some kind of
virtualbox thing, but one of them is an actual machine and still has the
same error. The DC is very lightly loaded. How would I debug what is
causing this reduction in IO?
I know that various errors in the Samba logs are not "issues" but this
one seems to be an issue. I don't like seeing IO_TIMEOUTs.
Another distracting error in the log included:
[2020/10/11 22:43:29.843630, 1, pid=969, effective(0, 0), real(0, 0)]
ads_find_dc: name resolution for realm 'AD.EECS.YORKU.CA' (domain
'EECSYORKUCA') failed: NT_STATUS_NO_LOGON_SERVERS
... after boot which sounds serious but it turns out if I try to
authenticate before everything is up and running, that's what I get. The
error makes sense but there's no "follow up" to say: "Ok ok - I found it
now - Sorry to give you a heart attack.". It's all a learning experience.
The real reason I was trying to change the hostnames was to deal with a
scenario particular of our environment. We have many dualboot machines
running Windows and Linux. I know that I can't join the domain with the
same name on both Linux and Windows systems because joining one would
change the password, then the other wouldn't be joined, etc. I
understand that it's possible to generate a machine password manually,
and use that from both sides, but as I understand it, this interferes
with the systems ability to change the machine password regularly which
seems more secure. I don't know if Samba does that. I also don't want
to have a different IP address for both sides because that would be
wasteful. I would prefer if the hostname would be the same on both
sides as well. I was trying to explore how carefully the name in the
AD computer database is tied to the "real" DNS name of the host. What I
was trying to do was to add to /etc/samba/smb.conf: netbios name=<system
hostname>-linux so that when I would join the hosts under Linux, they
would take on a "-linux" name, but only in the AD computer database.
When the host was booted, the host would have an AD name of <system
hostname>-linux, but a real name of just "<system hostname>". On
Windows, both the AD name and hostname would be "<system hostname>".
This would mean that on Windows, you could have a computer called
"test", and under Linux, "test-linux", but both would really be the same
physical PC and both would be host "test" with one IP. It wasn't
working. I am pretty sure I forgot the nfs/X entries on the NFS servers
after rejoining the domain so that may be the issue. However, thinking
back, I also think that "net ads keytab" would not let me add an entry
for "host/test...." because it wanted "host/test-linux....", but I could
be wrong. If the host *had* to take on its real identity "test-linux"
then test-linux could just be an alias for test, I guess, but then the
machine build would be a headache.... and when the Linux machines boot
they use dhcp (just like Windows) and the machine wouldn't know if it's
"test" or "test-linux". Lots of "fun".
More information about the samba