[Samba] NT_STATUS_INTERNAL_DB_CORRUPTION messages in log.samba--proper course of action?

Pinja-Liina Jalkanen pinja-liina.jalkanen at vihreat.fi
Fri Jul 3 13:25:02 UTC 2015


Hi all,

We've recently migrated from a separate DNS server that was dynamically
updated with BIND's update-policy, using a manually generated
tkey-gssapi-keytab (plus a second server functioning as an ordinary
slave to the first), to BIND9_DLZ. The setup predated Samba's AD DC
support and BIND's DLZ support, and was originally established because
even though we needed AD, we were unwilling to use Windows's own DNS server.

After the migration, while replication seems to work, Windows is finally
gone for good (yay!), kerberos seems to work and dynamic DNS updates
certainly work, there is still a lingering problem manifested as errors
in Samba logs that I'd like to ask about. The messages are as follows
(this is with -d3):

[2015/07/03 14:04:15.034263,  1]
../source4/dsdb/kcc/kcc_topology.c:1437(kcctpl_color_vertices)
  ../source4/dsdb/kcc/kcc_topology.c:1437: failed to find nCName
attribute of object CN=75a3f420-60ef-4728-8608-3ead61de4555,CN=Part
itions,CN=Configuration,DC=mydomain,DC=tld
[2015/07/03 14:04:15.034308,  1]
../source4/dsdb/kcc/kcc_topology.c:3236(kcctpl_create_connections)
  ../source4/dsdb/kcc/kcc_topology.c:3236: failed to color vertices:
NT_STATUS_INTERNAL_DB_CORRUPTION
[2015/07/03 14:04:15.034317,  1]
../source4/dsdb/kcc/kcc_topology.c:3496(kcctpl_create_intersite_connections)
  ../source4/dsdb/kcc/kcc_topology.c:3496: failed to create connections:
NT_STATUS_INTERNAL_DB_CORRUPTION
[2015/07/03 14:04:26.299572,  3]
../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)

Some background about how I did the migration process:

Because one of our DC's was still running Windows 2003, and its support
ends on this very month, we needed to migrate out of that entirely. To
that end we'd already transferred all the FSMO roles to Samba, which
worked as it should. But we also wanted to finally move from the
separate DNS to BIND9_DLZ. This is hardly a documented procedure; eg.
the Samba Wiki page at
https://wiki.samba.org/index.php/Changing_the_DNS_backend has no mention
of it.

The obvious way seemed to be to install Samba to our primary DNS for the
migration and then join that server to the domain. So I joined it to the
domain as a new DC, using a "--dns-backend=BIND9_DLZ" flag. But: this
didn't seem to work. The join went OK, but for some reason it didn't
create named.conf or dns.keytab into Samba's private directory.

In retrospect, it might had been better to install BIND on the DC having
the FSMO roles and run samba_upgradedns on that, but I didn't even know
to think about such an option in advance, because the documentation for
samba_upgradedns didn't take into account a situation like ours, where
the previous DNS backend had been NONE. And that box was never supposed
to run the DNS.

The next thing that I tried to do was to run "samba_upgradedns
--dns-backend=BIND9_DLZ" on the newly promoted machine. After manually
creating the "DnsAdmins" AD group it actually seemed to work. But I had
forgotten that the DNS/primarydns.domain.tld SPN was already assigned to
the user that had previously been used to do the dynamic updates. (I'll
return to the consequences of that mistake below.)

At this point I ran a short script commanding "samba-tool dns add" and
adding, one by one, all the old A records for hosts that have static IPs
from the domain.tld zonefile back to the domain.tld zone DB that was now
managed by Samba. I also noticed that the other DC's records weren't
there; I tried to run samba_dnsupdate on the FSMO server, but it failed
complaining about kerberos.

Because replication was seriously broken at this point due to missing
DNS records, I added the right records for the other DCs manually, and
pointed pdc._msdcs.mydomain.tld to the right DC. After this the two
Samba DCs replicated with each other without errors, but the Windows DC
didn't; it complaining about lingering objects, which was odd because
the DNS had been broken only for a short while, and no deletions had
been done during that period (the lingering object that Windows was
complaining about was a long since deleted, ordinary domain user).

After some futile repair attempts that failed mostly due to the mixed
Windows/Samba environment, I decided not to waste any more time with
Windows, because that box was to be demoted really soon anyway; I just
ran "dcpromo /forceremoval", cleaned up the metadata by running the
script on page
https://gallery.technet.microsoft.com/scriptcenter/d31f091f-2642-4ede-9f97-0e1cc4d577f3
through RSAT and manually cleaned up the relevant records from the DNS.

Now, with Windows finally gone, I had just two Samba DCs left: one
running the primary DNS (A), the other having the FSMO roles (B)--and
replication worked! DNS updates still didn't work, but there were hints
of SPN problems at log.samba, and at this point I finally realised my
aforementioned SPN mistake. After sorting these out and performing the
procedure described at
https://wiki.samba.org/index.php/Dns_tkey_negotiategss:_TKEY_is_unacceptable
kerberos finally started to work; "samba_dnsupdate --all-names --verbose
--fail-immediately" passed on both DCs and workstations finally started
to re-register themselves.

All now sorted out, except... the aforementioned INTERNAL_DB_CORRUPTION
errors. They're appearing in the log.samba of the current FSMO box (B).
Our future plan is to transfer the FSMO roles from DC B to DC A, join
our still-ordinary-slave secondary DNS to the domain as a new DC
(C)--migrating that to BIND9_DLZ in the process--and finally demote and
remove B, leaving just DCs A and C, both running Samba with BIND_DLZ
backends.

Before proceeding any further, I however wish to sort the errors out;
I've got my part of the scary moments already, when I envisioned
starting over by ditching DC A and restoring DC B from a backup.

So, to my question. What is the best option:

a) To try to manually equalise the attributes (with ADSI Edit or some
other LDAP tool) of the CN=75a3f420-60ef-4728-8608-3ead61de4555,CN=Part
itions,CN=Configuration,DC=mydomain,DC=tld object (e.g. the nCName
attribute that Samba is complaining about--that has the value
"DC=ForestDnsZones,DC=mydomain,DC=tld" on DC A but "<none>" on DC B)? Or
would this actually be a risky/dangerous procedure?

b) Just stop worrying and proceed with migrating the FSMO roles to DC A,
joining DC C and ditching DC B, trusting that when DC B is finally
demoted and gone, all will be fine? It'd be wonderful I could just trust
this option to work, because that'd be the least time consuming.

c) Judge the Samba DB to be beyond repair, ditch DC A, restore DC B from
a backup, start over again, and re-perform the DNS upgrade somehow
differently (how?). Obviously not my favourite option, because of the
extra work involved, and because things seem to mostly work already,
with normal replication working without errors.

What is the recommended course of action by the Samba team?
Our Samba version is 4.2.2. BIND is 9.9.5-9-Debian.

Thanking for any advice,

-- 
Pinja-Liina Jalkanen
Vihreät / De Gröna
https://www.vihreat.fi/


More information about the samba mailing list