[Samba] Clients no longer updating DNS & unable to delete MX records

Thu Mar 28 06:55:29 MDT 2013

On Thu, Mar 21, 2013 at 2:21 PM, Thomas Simmons <twsnnva at gmail.com> wrote:
> On Wed, Mar 20, 2013 at 3:29 PM, Thomas Simmons <twsnnva at gmail.com> wrote:
>>
>> On Wed, Mar 20, 2013 at 9:05 AM, Thomas Simmons <twsnnva at gmail.com> wrote:
>>>
>>> Hello,
>>>
>>> After noticing some odd behavior on my domain, I realized that many of my
>>> DNS records are incorrect and that clients are no longer properly updating
>>> DNS. While looking into this, I also discovered that I am unable to delete
>>> MX records via AD DNS Manager or samba-tool. Both tools "see" the record but
>>> report it does not exist when I attempt to delete it. I can create new MX
>>> records, but cannot delete them. I can create and delete both A and CNAME
>>> records. The same behavior occurs under all zones. I can create and delete
>>> new forward lookup zones.
>>>
>>> [root at ADC1 log]# samba-tool dns query adc1 internal.testdom.com mailsrv
>>> MX
>>> GENSEC backend 'gssapi_spnego' registered
>>> GENSEC backend 'gssapi_krb5' registered
>>> GENSEC backend 'gssapi_krb5_sasl' registered
>>> GENSEC backend 'sasl-DIGEST-MD5' registered
>>> GENSEC backend 'schannel' registered
>>> GENSEC backend 'spnego' registered
>>> GENSEC backend 'ntlmssp' registered
>>> GENSEC backend 'krb5' registered
>>> GENSEC backend 'fake_gssapi_krb5' registered
>>> Using binding ncacn_ip_tcp:adc1[,sign]
>>>   Name=, Records=3, Children=0
>>>     MX: mailsrv.internal.testdom.com. (10) (flags=f0, serial=4, ttl=900)
>>>
>>> [root at ADC1 log]# samba-tool dns delete adc1 internal.testdom.com mailsrv
>>> MX 'mailsrv.internal.testdom.com 10'
>>> GENSEC backend 'gssapi_spnego' registered
>>> GENSEC backend 'gssapi_krb5' registered
>>> GENSEC backend 'gssapi_krb5_sasl' registered
>>> GENSEC backend 'sasl-DIGEST-MD5' registered
>>> GENSEC backend 'schannel' registered
>>> GENSEC backend 'spnego' registered
>>> GENSEC backend 'ntlmssp' registered
>>> GENSEC backend 'krb5' registered
>>> GENSEC backend 'fake_gssapi_krb5' registered
>>> Using binding ncacn_ip_tcp:adc1[,sign]
>>> ERROR(runtime): uncaught exception - (9701,
>>> 'WERR_DNS_ERROR_RECORD_DOES_NOT_EXIST')
>>>   File
>>> "/usr/local/samba/lib/python2.6/site-packages/samba/netcmd/__init__.py",
>>> line 175, in _run
>>>     return self.run(*args, **kwargs)
>>>   File
>>> "/usr/local/samba/lib/python2.6/site-packages/samba/netcmd/dns.py", line
>>> 1169, in run
>>>     del_rec_buf)
>>>
>>
>> With log level = 10, when attempting to deleting the record, it appears to
>> find it, but reports it doesn't exist anyway. Has anyone seen this behavior
>> before? The last DNS update was nearly 2 weeks ago and I am not aware of
>> anything that happened around that time that would have triggered this. I
>> don't know it this MX problem and the clients being unable to update DNS are
>> related.
>>
>> [2013/03/20 13:52:20,  5, pid=2064, effective(0, 0), real(0, 0)]
>> ../lib/ldb-samba/ldb_wrap.c:69(ldb_wrap_debug)
>>   ldb: ldb_trace_request: SEARCH
>>    dn:
>> DC=internal.testdom.com,CN=MicrosoftDNS,DC=DomainDnsZones,DC=internal,DC=testdom,DC=com
>>    scope: one
>>    expr: (&(objectClass=dnsNode)(name=mailsrv))
>>    attr: dnsRecord
>>    control: <NONE>
>>
>> [2013/03/20 13:52:20,  5, pid=2064, effective(0, 0), real(0, 0)]
>> ../lib/ldb-samba/ldb_wrap.c:69(ldb_wrap_debug)
>>   ldb: ldb_trace_request: (resolve_oids)->search
>> ...
>> ...
>> ...
>>
>> [2013/03/20 13:52:20,  5, pid=2064, effective(0, 0), real(0, 0)]
>> ../lib/ldb-samba/ldb_wrap.c:69(ldb_wrap_debug)
>>   ldb: ldb_trace_response: ENTRY
>>   dn:
>> DC=mailsrv,DC=internal.testdom.com,CN=MicrosoftDNS,DC=DomainDnsZones,DC=internal,DC=testdom,DC=com
>>   dnsRecord::
>> IgAPAAXwAAAEAAAAAAADhAAAAAALIDcAAAoeBAdtYWlsc3J2CGludGVybmFsB7G4YX
>>    lzZXMDY29tAA==
>>   dnsRecord:: EAAPAAXwAAA+AAAAAAAAAAAAAADcIjcAAAoMAgZnb29nbGUDY29tAA==
>>   dnsRecord::
>> IgAPAAXwAAAEAAAAAAADhAAAAAALIDcAAAoeBAdtYWlsc3J2CGludGVybmFsB7G4YX
>>    lzZXMDY29tAA==
>>
>> [2013/03/20 13:52:20,  5, pid=2064, effective(0, 0), real(0, 0)]
>> ../lib/ldb-samba/ldb_wrap.c:69(ldb_wrap_debug)
>>   ldb: ldb_trace_response: DONE
>>   error: 0
>>
>> [2013/03/20 13:52:20,  1, pid=2064, effective(0, 0), real(0, 0)]
>> ../librpc/ndr/ndr.c:282(ndr_print_function_debug)
>>        DnssrvUpdateRecord2: struct DnssrvUpdateRecord2
>>           out: struct DnssrvUpdateRecord2
>>               result                   :
>> WERR_DNS_ERROR_RECORD_DOES_NOT_EXIST
>
>
> It looks like the last DNS update occurred on March 7th. I restored a backup
> from March 5th to a sandbox environment and it's displaying the same
> behavior. I then restored a December backup (taken just after performing the
> classicupgrade) and do not have the problem. I'm not sure what would be the
> best way to recover from this. Is there anyway to "reset" DNS? Apart from
> that, all I can think to do is start at March 4th and restore each backup
> until the problem goes away. Would it be possible to restore AD (minus DNS)
> once this is done?
>
> The last time a client successfully updated DNS was Mar 7 17:58:08:
>
> Mar  7 17:58:08 ADC1 named[977]: samba_dlz: starting transaction on zone
> internal.testdom.com
> Mar  7 17:58:08 ADC1 named[977]: samba_dlz: allowing update of
> signer=aspire\$\@INTERNAL.TESTDOM.COM name=ASPIRE.internal.testdom.com
> tcpaddr= type=AAAA key=...
> Mar  7 17:58:08 ADC1 named[977]: samba_dlz: allowing update of
> signer=aspire\$\@INTERNAL.TESTDOM.COM name=ASPIRE.internal.testdom.com
> tcpaddr= type=A key=...
> Mar  7 17:58:08 ADC1 named[977]: samba_dlz: allowing update of
> signer=aspire\$\@INTERNAL.TESTDOM.COM name=ASPIRE.internal.testdom.com
> tcpaddr= type=A key=...
> Mar  7 17:58:08 ADC1 named[977]: client 10.10.65.22#49865: updating zone
> 'internal.testdom.com/NONE': deleting rrset at 'ASPIRE.internal.testdom.com'
> AAAA
> Mar  7 17:58:08 ADC1 named[977]: client 10.10.65.22#49865: updating zone
> 'internal.testdom.com/NONE': deleting rrset at 'ASPIRE.internal.testdom.com'
> A
> Mar  7 17:58:08 ADC1 named[977]: samba_dlz: subtracted rdataset
> ASPIRE.internal.testdom.com
> 'ASPIRE.internal.testdom.com.#0111200#011IN#011A#01110.10.65.22'
> Mar  7 17:58:08 ADC1 named[977]: client 10.10.65.22#49865: updating zone
> 'internal.testdom.com/NONE': adding an RR at 'ASPIRE.internal.testdom.com' A
> Mar  7 17:58:08 ADC1 named[977]: samba_dlz: added rdataset
> ASPIRE.internal.testdom.com
> 'ASPIRE.internal.testdom.com.#0111200#011IN#011A#01110.10.65.22'
> Mar  7 17:58:08 ADC1 named[977]: samba_dlz: committed transaction on zone
> internal.testdom.com
>
> All DNS updates after that one fail:
>
> Mar  7 18:59:37 ADC1 named[977]: samba_dlz: starting transaction on zone
> internal.testdom.com
> Mar  7 18:59:37 ADC1 named[977]: client 10.10.65.23#57259: update
> 'internal.testdom.com/IN' denied
> Mar  7 18:59:37 ADC1 named[977]: samba_dlz: cancelling transaction on zone
> internal.testdom.com
> Mar  7 18:59:37 ADC1 named[977]: samba_dlz: starting transaction on zone
> internal.testdom.com
> Mar  7 18:59:37 ADC1 named[977]: client 10.10.65.23#65190: update
> 'internal.testdom.com/IN' denied
> Mar  7 18:59:37 ADC1 named[977]: samba_dlz: cancelling transaction on zone
> internal.testdom.com

It appears the inability to delete MX records is a bug that is
specific to 32-bit Linux. I am able to duplicate this behavior and
error (can create but not delete MX records) on a clean
install/provision using any version of Samba4 and CentOS 6.x or Ubuntu
12.04. I actually stumbled on this problem - while troubleshooting, I
just happened to restore to a 64-bit Linux VM and it worked. I
originally deployed Samba4 in a VM, later moving it to a physical, 32
bit server. I never noticed this problem because I never tried
deleting an MX record - I'm only doing it now for troubleshooting
purposes. If any is running S4 on 32-bit Linux, please try creating
and deleting an MX record to see if you can duplicate what I am
seeing. I will open a bug for this after a little more testing.

Back to the original problem  (clients not updating), restoring to a
64-bit VM did not help. However, I temporarily reverted to internal
DNS as a test, and clients could then update DNS. This is not
currently a viable solution since internal DNS does not support MX or
CNAME records, but it does let me know that the problem is specific to
BIND9_DLZ. Running bind in debugging, I see the following output:

28-Mar-2013 08:26:15.759 failed gss_inquire_cred: GSSAPI error: Major
= Unspecified GSS failure.  Minor code may provide more information,
Minor = Success.
28-Mar-2013 08:26:15.760 failed gss_accept_sec_context: GSSAPI error:
Major = Unspecified GSS failure.  Minor code may provide more
information, Minor = .
28-Mar-2013 08:26:15.760 process_gsstkey(): dns_tsigerror_badkey

Searching for this doesn't yield much. I see someone had the issue a
few years back and Andrew recommended upgrading to Bind 9.8, which I
am already running.