[Samba] NT_STATUS_INTERNAL_DB_CORRUPTION messages in log.samba--proper course of action?

Rowland Penny rowlandpenny241155 at gmail.com
Fri Jul 3 14:32:45 UTC 2015


On 03/07/15 14:25, Pinja-Liina Jalkanen wrote:
> Hi all,
>
> We've recently migrated from a separate DNS server that was dynamically
> updated with BIND's update-policy, using a manually generated
> tkey-gssapi-keytab (plus a second server functioning as an ordinary
> slave to the first), to BIND9_DLZ. The setup predated Samba's AD DC
> support and BIND's DLZ support, and was originally established because
> even though we needed AD, we were unwilling to use Windows's own DNS server.
>
> After the migration, while replication seems to work, Windows is finally
> gone for good (yay!), kerberos seems to work and dynamic DNS updates
> certainly work, there is still a lingering problem manifested as errors
> in Samba logs that I'd like to ask about. The messages are as follows
> (this is with -d3):
>
> [2015/07/03 14:04:15.034263,  1]
> ../source4/dsdb/kcc/kcc_topology.c:1437(kcctpl_color_vertices)
>    ../source4/dsdb/kcc/kcc_topology.c:1437: failed to find nCName
> attribute of object CN=75a3f420-60ef-4728-8608-3ead61de4555,CN=Part
> itions,CN=Configuration,DC=mydomain,DC=tld
> [2015/07/03 14:04:15.034308,  1]
> ../source4/dsdb/kcc/kcc_topology.c:3236(kcctpl_create_connections)
>    ../source4/dsdb/kcc/kcc_topology.c:3236: failed to color vertices:
> NT_STATUS_INTERNAL_DB_CORRUPTION
> [2015/07/03 14:04:15.034317,  1]
> ../source4/dsdb/kcc/kcc_topology.c:3496(kcctpl_create_intersite_connections)
>    ../source4/dsdb/kcc/kcc_topology.c:3496: failed to create connections:
> NT_STATUS_INTERNAL_DB_CORRUPTION
> [2015/07/03 14:04:26.299572,  3]
> ../source4/auth/kerberos/krb5_init_context.c:80(smb_krb5_debug_wrapper)
>
> Some background about how I did the migration process:
>
> Because one of our DC's was still running Windows 2003, and its support
> ends on this very month, we needed to migrate out of that entirely. To
> that end we'd already transferred all the FSMO roles to Samba, which
> worked as it should. But we also wanted to finally move from the
> separate DNS to BIND9_DLZ. This is hardly a documented procedure; eg.
> the Samba Wiki page at
> https://wiki.samba.org/index.php/Changing_the_DNS_backend has no mention
> of it.
>
> The obvious way seemed to be to install Samba to our primary DNS for the
> migration and then join that server to the domain. So I joined it to the
> domain as a new DC, using a "--dns-backend=BIND9_DLZ" flag. But: this
> didn't seem to work. The join went OK, but for some reason it didn't
> create named.conf or dns.keytab into Samba's private directory.
>
> In retrospect, it might had been better to install BIND on the DC having
> the FSMO roles and run samba_upgradedns on that, but I didn't even know
> to think about such an option in advance, because the documentation for
> samba_upgradedns didn't take into account a situation like ours, where
> the previous DNS backend had been NONE. And that box was never supposed
> to run the DNS.
>
> The next thing that I tried to do was to run "samba_upgradedns
> --dns-backend=BIND9_DLZ" on the newly promoted machine. After manually
> creating the "DnsAdmins" AD group it actually seemed to work. But I had
> forgotten that the DNS/primarydns.domain.tld SPN was already assigned to
> the user that had previously been used to do the dynamic updates. (I'll
> return to the consequences of that mistake below.)
>
> At this point I ran a short script commanding "samba-tool dns add" and
> adding, one by one, all the old A records for hosts that have static IPs
> from the domain.tld zonefile back to the domain.tld zone DB that was now
> managed by Samba. I also noticed that the other DC's records weren't
> there; I tried to run samba_dnsupdate on the FSMO server, but it failed
> complaining about kerberos.
>
> Because replication was seriously broken at this point due to missing
> DNS records, I added the right records for the other DCs manually, and
> pointed pdc._msdcs.mydomain.tld to the right DC. After this the two
> Samba DCs replicated with each other without errors, but the Windows DC
> didn't; it complaining about lingering objects, which was odd because
> the DNS had been broken only for a short while, and no deletions had
> been done during that period (the lingering object that Windows was
> complaining about was a long since deleted, ordinary domain user).
>
> After some futile repair attempts that failed mostly due to the mixed
> Windows/Samba environment, I decided not to waste any more time with
> Windows, because that box was to be demoted really soon anyway; I just
> ran "dcpromo /forceremoval", cleaned up the metadata by running the
> script on page
> https://gallery.technet.microsoft.com/scriptcenter/d31f091f-2642-4ede-9f97-0e1cc4d577f3
> through RSAT and manually cleaned up the relevant records from the DNS.
>
> Now, with Windows finally gone, I had just two Samba DCs left: one
> running the primary DNS (A), the other having the FSMO roles (B)--and
> replication worked! DNS updates still didn't work, but there were hints
> of SPN problems at log.samba, and at this point I finally realised my
> aforementioned SPN mistake. After sorting these out and performing the
> procedure described at
> https://wiki.samba.org/index.php/Dns_tkey_negotiategss:_TKEY_is_unacceptable
> kerberos finally started to work; "samba_dnsupdate --all-names --verbose
> --fail-immediately" passed on both DCs and workstations finally started
> to re-register themselves.
>
> All now sorted out, except... the aforementioned INTERNAL_DB_CORRUPTION
> errors. They're appearing in the log.samba of the current FSMO box (B).
> Our future plan is to transfer the FSMO roles from DC B to DC A, join
> our still-ordinary-slave secondary DNS to the domain as a new DC
> (C)--migrating that to BIND9_DLZ in the process--and finally demote and
> remove B, leaving just DCs A and C, both running Samba with BIND_DLZ
> backends.
>
> Before proceeding any further, I however wish to sort the errors out;
> I've got my part of the scary moments already, when I envisioned
> starting over by ditching DC A and restoring DC B from a backup.
>
> So, to my question. What is the best option:
>
> a) To try to manually equalise the attributes (with ADSI Edit or some
> other LDAP tool) of the CN=75a3f420-60ef-4728-8608-3ead61de4555,CN=Part
> itions,CN=Configuration,DC=mydomain,DC=tld object (e.g. the nCName
> attribute that Samba is complaining about--that has the value
> "DC=ForestDnsZones,DC=mydomain,DC=tld" on DC A but "<none>" on DC B)? Or
> would this actually be a risky/dangerous procedure?
>
> b) Just stop worrying and proceed with migrating the FSMO roles to DC A,
> joining DC C and ditching DC B, trusting that when DC B is finally
> demoted and gone, all will be fine? It'd be wonderful I could just trust
> this option to work, because that'd be the least time consuming.
>
> c) Judge the Samba DB to be beyond repair, ditch DC A, restore DC B from
> a backup, start over again, and re-perform the DNS upgrade somehow
> differently (how?). Obviously not my favourite option, because of the
> extra work involved, and because things seem to mostly work already,
> with normal replication working without errors.
>
> What is the recommended course of action by the Samba team?
> Our Samba version is 4.2.2. BIND is 9.9.5-9-Debian.
>
> Thanking for any advice,
>

Why did you go with '--dns-backend=None' , did you miss the 'NONE skips 
the  DNS setup entirely (not recommended)' part in the commands help? 
Don't bother answering, this is a rhetorical question.

OK, I suggest that you look in /usr/share/samba/provision/sambadns.py 
and then 'create_dns_partitions'. This is what *didn't* get run when you 
provisioned, You should be able to work out what you need to do now.

Rowland





More information about the samba mailing list