[Samba] samba AD problem after re-join domain

Jason Keltz jas at eecs.yorku.ca
Mon Oct 12 14:36:55 UTC 2020


On 10/12/2020 4:06 AM, Rowland penny via samba wrote:
> On 12/10/2020 02:54, Jason Keltz via samba wrote:
>> I've been working on a Samba AD setup with a bunch of test machines - 
>> the one DC, and a bunch of clients. Last night, I ended up switching 
>> the name of the test machines temporarily (except the DC), and 
>> re-joining the domain (that's for another e-mail later). When things 
>> didn't work the way I had planned,  I switched the hostnames back, 
>> and re-joined the domain today on all the test machines.  I was 
>> shocked to find that I am only able to login to the domain on one of 
>> my hosts. It fails on all the other ones.  I ensured that I deleted 
>> the machine entries from AD.  I haven't changed my Samba config in 
>> months which Rowland had last verified was fine.  I haven't changed 
>> my /etc/krb5.conf Kerberos config in months.  I even did a complete 
>> rebuild of one of the machines since I automated the installation 
>> process, and that rebuild was working perfectly many many times, but 
>> now it is failed.  In winbind log every time I try to login I'm 
>> mostly seeing:
>
> Did you leave the domain before you changed the hostname ?
>
> Why did you change the hostnames ? In a case like this, I would have 
> set up a new computer, joined this to the domain and then removed the 
> old computer from the domain. 

Hi Rowland,

I did not leave the domain, but I did delete the entry by either the 
Windows AD tool or "samba-tool computer delete" option.  I can't 
remember which one at this point.  I think that clears up all the bits.  
Is that correct?  On the local host, I also deleted the 
/etc/krb5.keytab, and deleted all the samba bits so that the join was fresh.

Things are better today.  I discovered one issue which seemingly 
unrelated (to me) to the errors seems to have been the cause of a lot of 
the trouble.  I was chasing errors in winbind log, but several of the 
test servers are NFS servers, and when I rejoined them to the domain, I 
didn't replace the nfs/X entries in their keytab.  Now, the clients 
couldn't mount, and that definately caused some trouble, for which I 
didn't see the signs.  I'm still watching though. However, I can login 
to all the hosts now.

By the way, at one point, I rebooted the DC, and I noticed that all the 
AD clients showed something like this:

[2020/10/12 09:25:19.183616,  1, pid=36145, effective(0, 0), real(0, 0)] 
../../source3/rpc_client/cli_pipe.c:422(cli_pipe_validate_current_pdu)
   ../../source3/rpc_client/cli_pipe.c:422: Bind NACK received from host 
dc1.ad.eecs.yorku.ca!
[2020/10/12 09:44:11.598150,  1, pid=36145, effective(0, 0), real(0, 0)] 
../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
   Reducing LDAP page size from 1000 to 500 due to IO_TIMEOUT

(Which is strange because this means that if you reboot he DC, then the 
clients start talking slower to it when it comes back up?  I don't think 
the number ever increases unless you restart winbind everywhere?)

and since that reboot, I've seen a few of them do this:

[2020/10/12 10:00:19.814381,  1, pid=36145, effective(0, 0), real(0, 0)] 
../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
   Reducing LDAP page size from 500 to 250 due to IO_TIMEOUT
[2020/10/12 10:16:19.557261,  1, pid=36145, effective(0, 0), real(0, 0)] 
../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
   Reducing LDAP page size from 250 to 125 due to IO_TIMEOUT

Two of them are virtualbox VMs, so I figured maybe it's some kind of 
virtualbox thing, but one of them is an actual machine and still has the 
same error.  The DC is very lightly loaded.  How would I debug what is 
causing this reduction in IO?

I know that various errors in the Samba logs are not "issues" but this 
one seems to be an issue.  I don't like seeing IO_TIMEOUTs.

Another distracting error in the log included:

[2020/10/11 22:43:29.843630,  1, pid=969, effective(0, 0), real(0, 0)] 
../../source3/libads/ldap.c:565(ads_find_dc)
   ads_find_dc: name resolution for realm 'AD.EECS.YORKU.CA' (domain 
'EECSYORKUCA') failed: NT_STATUS_NO_LOGON_SERVERS

... after boot which sounds serious but it turns out if I try to 
authenticate before everything is up and running, that's what I get. The 
error makes sense but there's no "follow up" to say: "Ok ok - I found it 
now - Sorry to give you a heart attack.".  It's all a learning experience.

The real reason I was trying to change the hostnames was to deal with a 
scenario particular of our environment.  We have many dualboot machines  
running Windows and Linux.  I know that I can't join the domain with the 
same name on both Linux and Windows systems because joining one would 
change the password, then the other wouldn't be joined, etc.  I 
understand that it's possible to generate a machine password manually, 
and use that from both sides, but as I understand it, this interferes 
with the systems ability to change the machine password regularly which 
seems more secure.  I don't know if Samba does that.   I also don't want 
to have a different IP address for both sides because that would be 
wasteful.  I would prefer if the hostname would be the same on both 
sides as well.    I was trying to explore how carefully the name in the 
AD computer database is tied to the "real" DNS name of the host.  What I 
was trying to do was to add to /etc/samba/smb.conf: netbios name=<system 
hostname>-linux so that when I would join the hosts under Linux, they 
would take on a "-linux" name, but only in the AD computer database.  
When the host was booted, the host would have an AD name of <system 
hostname>-linux, but a real name of just "<system hostname>".    On 
Windows, both the AD name and hostname would be "<system hostname>".  
This would mean that on Windows, you could have a computer called 
"test", and under Linux, "test-linux", but both would really be the same 
physical PC and both would be host "test" with one IP.    It wasn't 
working.  I am pretty sure I forgot the nfs/X entries on the NFS servers 
after rejoining the domain so that may be the issue.  However, thinking 
back, I also think that "net ads keytab" would not let me add an entry 
for "host/test...." because it wanted "host/test-linux....", but I could 
be wrong.  If the host *had* to take on its real identity "test-linux" 
then test-linux could just be an alias for test, I guess, but then the 
machine build would be a headache.... and when the Linux machines boot 
they use dhcp (just like Windows) and the machine wouldn't know if it's 
"test" or "test-linux". Lots of "fun".

Jason.





More information about the samba mailing list