[Samba] Samba internal DNS client, large replies and TC bit

Michael Tokarev mjt at tls.msk.ru
Mon Dec 4 10:21:05 UTC 2023


Hi!

We had a painful debugging session today, with a samba AS member server
not being able to auth users anymore.

The issue seems to be due to defect in samba internal DNS resolution as
done in winbind.

TL;DR: samba internal DNS client should not rely on UDP-only DNS, but
should retry using TCP if TC bit is set in answer.  There's a real-life
issue with this simplistic DNS implementation.


One of the domains here gained a few more domain controllers.  And it turns
out the DNS reply stopped fitting in single UDP packet.  So named (9.18.19)
stopped providing replies in UDP answer:

$ dnsget -t srv -v _ldap._tcp.dc._msdcs.rgs.ru. -n 127.0.0.1
;; trying _ldap._tcp.dc._msdcs.rgs.ru.
;; sending 56 bytes query to 127.0.0.1 port 53

;; received 56 bytes response from 127.0.0.1 port 53
;; warning: TC bit set, probably incomplete reply
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 101, size: 56
;; flags: qr tc rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; QUERY SECTION (1):
;_ldap._tcp.dc._msdcs.rgs.ru.	IN	SRV

;; ADDITIONAL section (1):
;EDNS0 OPT record (UDPsize: 4096, ERcode: 0, Flags: 0x00): 0 bytes

Note the "warning: TC bit set" and zero records in ANSWER section.

This is a local NAMED cache which forwards requests to windows-based
nameservers of the said domain.  When using one of the windows DNS
server though, it works:

$ dnsget -t srv -v _ldap._tcp.dc._msdcs.rgs.ru. -n 10.224.1.21
;; trying _ldap._tcp.dc._msdcs.rgs.ru.
;; sending 56 bytes query to 10.224.1.21 port 53

;; received 2731 bytes response from 10.224.1.21 port 53
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9707, size: 2731
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 67, AUTHORITY: 0, ADDITIONAL: 26

;; QUERY SECTION (1):
;_ldap._tcp.dc._msdcs.rgs.ru.	IN	SRV

;; ANSWER section (67):
...

Note: ADDITIONAL section only contains 26 records instead of 67, so it
is a partial list (it should provide 67 A records for the 67 names
referenced in ANSWER section, and this is done when querying over TCP).


I dunno why NAMED does not provide at least minimal set of answers
anymore (minimal-responses is set to "yes" in named.conf) the same way
windows nameserver does.

The impact is that when using local named cache (which only replies using
TCP), winbind is unable to find domain controllers:

[2023/12/04 13:16:04.654006, 10, pid=307741, effective(0, 0), real(0, 0)] ../../source3/libsmb/namequery.c:2516(resolve_ads)
   resolve_ads: SRV query for _ldap._tcp.dc._msdcs.RGS.RU
[2023/12/04 13:16:04.654083,  4, pid=307741, effective(0, 0), real(0, 0)] ../../source3/libsmb/namequery.c:3295(get_dc_list)
   get_dc_list: no servers found

While if pointed to windows nameservers, it works, getting these 67
addresses and correctly trying them.


Now the problem is how to configure all this.  The AD nameservers don't know
anything about a few DNS domains which are configured on this server, so
using this in resolv.conf doesn't quite work..

Help?

Thanks,

/mjt



More information about the samba mailing list