[Samba] Multi domain controller environment Ubuntu 12.04, replication and DNS updates broken

Chris Alavoine chrisa at acs-info.co.uk
Fri Oct 3 01:30:48 MDT 2014

Apologies, I have given a more detailed explanation earlier in this post
(and in others), but basically the DomainDnsZones ldb file appears to be
corrupt in some way.

When I try to add a new DNS record (on the FSMO DC) I get an error that
"The local security authority database contains an internal inconsitency".
I know that there are ~450000 "CN=Deleted Objects" records in the ldb but
it gets stuck at 88159 whenever I try an ldbsearch. DNS still resolves and
other parts of Samba4 are replicating ok (users/groups/password etc), but
with regards to DNS I can't get any life from it (and I've tried various
different tests in a VM test lab; migrating to BIND, manually removing
records using ldbdel, attempting to upgrade to 4.1.12 and everything in
between (it's currently on 4.1.7), but all to no avail.

Now I have 2 good working DC's with ~10,000 Deleted Objects records and one
of them is on 4.1.12, my plan is to rejoin all my outlying DC's via these 2
DC's. They are all VM's so cloning and rebuilding isn't really that much of
laborious task.

My only concern at present is what to do with the FSMO roles, should I:

a: Transfer them to my good working 4.1.12 DC before I decommission the
broken FSMO DC?

b. Decommission the FSMO DC and seize them?

c. Something else...


On 3 October 2014 07:48, L.P.H. van Belle <belle at bazuin.nl> wrote:

>  can you explain the " very sorry state " of you DC, maybe this is
> fixable, and isn't a rebuild needed.
> Best regards,
> Louis
>  ------------------------------
> *Van:* Chris Alavoine [mailto:chrisa at acs-info.co.uk]
> *Verzonden:* donderdag 2 oktober 2014 19:09
> *Aan:* L.P.H. van Belle
> *CC:* samba at lists.samba.org; kseeger at samba.org
> *Onderwerp:* Re: [Samba] Multi domain controller environment Ubuntu
> 12.04, replication and DNS updates broken
>  Hi all,
> Update.
> I have managed to get things to a more stable position. My best DC (in New
> York) got down to 13,000 records today so I decided to try and join a new
> DC from London using 4.1.12 (Ubuntu 12.04) and it worked like a charm. I
> now have a very snappy 4.1.12 DC in London which I am going to use to
> rebuild the domain.
> My FSMO roles DC is in a very sorry state so I shall take this down during
> a maintenance period and rebuild and rejoin it (this IP is used a lot
> throughout the company so we can't be without it for long).
> Thanks for all the help on the lists.
> Chris.
> On 1 October 2014 11:33, Chris Alavoine <chrisa at acs-info.co.uk> wrote:
>> Hi Louis,
>> Many thanks for replying.
>> Unfortunately, I can't seem to upgrade past 4.1.8 on Ubuntu 12.04. Am
>> currently running 4.1.7 on 4 of my DC's and 4.1.8 on one of them. If I try
>> to upgrade to 4.1.12 for instance Samba refuses to start. My only theory at
>> this stage is that when I originally provisioned my domain
>> (classic_upgrade) I created a new OU purely for Groups and moved all groups
>> into it (including default AD ones like Domain Admins, Domain Users, Domain
>> Computers, DnsAdmins etc). Could this be causing me issues? I know that
>> migrating to Bind fails if DnsAdmins is not in the Users OU. I am slightly
>> loathe to start moving things around for fear of wreaking more havoc though.
>> My biggest problem is getting the DomainDnsZones ldb file to a manageable
>> size. My worst case is ~450000 records, but there are anomalies throughout
>> the domain as replication for DNS appears to be busted. On one of my DC's
>> when I do an:
>> ldbsearch -H DC=DOMAINDNSZONES,DC=EXAMPLE,DC=COM.ldb 'isDeleted' dn
>> I am seeing:
>>  ltdb: tdb(DC=DOMAINDNSZONES,DC=EXAMPLE,DC=COM.ldb): tdb_rec_read bad
>> magic 0xd9fee666 at offset=1663653516
>> search error - Indexed and full searches both failed!
>> This looks like a bad thing to me and any mention of this error on the
>> lists seems to suggest corruption...
>> My main FSMO roles DC just stops at 88159 records and appears to be stuck
>> there.
>> I have lowered tombstoneLifetime to 15 on all DC's using ADSI to try and
>> get at least one decent DC with a low number of records.
>> My best-looking DC (based in New York on a point to point fibre link) is
>> currently down to 123457 records and getting lower so tombstone is having
>> an effect there. I am going to wait a few days until this is down to a
>> reasonable size and then attempt to join a new DC here, shutdown main FSMO
>> roles DC and other broken DC's, seize the roles and rebuild from there.
>> Much like this post:
>> http://lists-archives.com/samba/79874-samba4-replication-issues-sam-ldb-inconsistency.html
>> Thanks,
>> Chris.
>> On 1 October 2014 11:16, L.P.H. van Belle <belle at bazuin.nl> wrote:
>>> ah..
>>>  DeletedObjects ... and replication errors.
>>> This is a known samba 4 bug.
>>> see also :  https://bugzilla.samba.org/show_bug.cgi?id=10398
>>> Look at the post : No objectClass found in replPropertyMetaData *(was
>>> thread :replication issues solved by adding GUID name ... )
>>> by me. ;-)  today an old e-mail entered the mailing list, which involves
>>> the problem you discribe.
>>> I dont know it the fix in in the latest samba release yet.
>>> maybe someone of samba knows.
>>> Karolin can you answhere this? or pass this to someone who knows.
>>> Louis
>>> >-----Oorspronkelijk bericht-----
>>> >Van: chrisa at acs-info.co.uk
>>> >[mailto:samba-bounces at lists.samba.org] Namens Chris Alavoine
>>> >Verzonden: woensdag 1 oktober 2014 9:31
>>> >Aan: samba at lists.samba.org
>>> >Onderwerp: [Samba] Multi domain controller environment Ubuntu
>>> >12.04, replication and DNS updates broken
>>>  >
>>> >Hi all,
>>> >
>>> >Am posting this again with a more helpful subject line...
>>> >
>>> >My 5 DC production domain (4.1.7 Ubuntu 12.04) is in a bit of a state.
>>> >
>>> >I attempted an upgrade from 4.1.5 to 4.1.7 which appeared to
>>> >work, but now
>>> >we have replication errors and am unable to add any new DNS
>>> >entries. I am
>>> >now certain that we've fallen foul of the DomainDnsZones DeletedObjects
>>> >problem that I've been reading about in various posts on the lists.
>>> >
>>> >between 3 and 4GB on each of the DC's. Doing an ldapsearch (
>>> >ldbsearch -H
>>> >'isDeleted=TRUE' dn )on
>>> >each DC returns a different number of objects ranging from
>>> >387000 down to
>>> >88000 on the FSMO DC. Almost all of these are stale isDeleted entries.
>>> >
>>> >I have lowered the tombstoneLifetime setting as suggested by
>>> >other posters
>>> >on the lists and this appears to be slowly (very slowly) lowering the
>>> >number of records within the ldb domaindnszones file, my hope
>>> >is that they
>>> >will lower sufficiently so that I can join a new working 4.1.12 DC to
>>> >domain.
>>> >
>>> >I am currently attempting a Bind migration on a test DC as
>>> >this is toted as
>>> >a possible fix (any successes out there with this?).
>>> >
>>> >A matter of note for the lists: When I originally provisioned my domain
>>> >(classic upgrade from Samba3) I created a new OU for Groups
>>> >and moved all
>>> >groups into it, this is a mistake if you want to migrate to Bind as the
>>> >migration script needs CN=DnsAdmins to be in Users OU, if it isn't the
>>> >script errors. I moved DnsAdmins back to Users to get the script to
>>> >complete.
>>> >
>>> >At present I'm holding the domain together with bits of string
>>> >and sticky
>>> >tape - having to reboot one of my DC's every 30 mins just to
>>> >keep things
>>> >ticking over.
>>> >
>>> >I have tried many variations of joining a new DC to the domain
>>> >but that has
>>> >failed, so my current plan is to create a test version of my
>>> >FSMO DC using
>>> >BIND_DLZ (using a current snapshot of the FSMO DC) and get things to a
>>> >working state there, and then replace this on the production site and
>>> >re-join new DC's to rebuild things. Obviously, not best practice but I
>>> >can't think of any other way of getting things stable again.
>>> >
>>> >I have tried manually editing the .ldb files but they are so
>>> >inflated now
>>> >that any vim edits just time out and error.
>>> >
>>> >Thanks,
>>> >Chris.
>>> >
>>> >--
>>> >ACS (Alavoine Computer Services Ltd)
>>> >Chris Alavoine
>>> >mob +44 (0)7724 710 730
>>> >www.alavoinecs.co.uk
>>> >http://twitter.com/#!/alavoinecs
>>> >http://www.linkedin.com/pub/chris-alavoine/39/606/192
>>> >--
>>> >To unsubscribe from this list go to the following URL and read the
>>> >instructions:  https://lists.samba.org/mailman/options/samba
>>> >
>>> >
>> --
>> ACS (Alavoine Computer Services Ltd)
>> Chris Alavoine
>> mob +44 (0)7724 710 730
>> www.alavoinecs.co.uk
>> http://twitter.com/#!/alavoinecs
>> http://www.linkedin.com/pub/chris-alavoine/39/606/192
> --
> ACS (Alavoine Computer Services Ltd)
> Chris Alavoine
> mob +44 (0)7724 710 730
> www.alavoinecs.co.uk
> http://twitter.com/#!/alavoinecs
> http://www.linkedin.com/pub/chris-alavoine/39/606/192

ACS (Alavoine Computer Services Ltd)
Chris Alavoine
mob +44 (0)7724 710 730

More information about the samba mailing list