[Samba] samba AD database suspected corruption

L.P.H. van Belle belle at bazuin.nl
Mon Oct 23 19:23:30 UTC 2017


a quick read and one thing, the dc4, after the upgrade to 4.7, did you reindex the ad database?
if im correct, samba-tool dbcheck — reindex


i did read that from a list somewere, a responce of andrew.


greetz
Louis
(mobile)

Op 23 okt. 2017 om 20:47 heeft mj via samba <samba at lists.samba.org> het volgende geschreven:


Hi,

Back in the samba 4.1 days, we experienced a samba database corruption: 
tombstones not being deleted from sam.lbd, ultimately resulting in a 
huge database, full root disk, samba crashing, we were completely down. 
We asked the great guys at sernet to help, they did super work, and 
managed to get us up and running again, including the addition of a 
fresh DC4.

Currently on 4.5.15, we have some strange issues with our samba AD 
setup, that I feel are remains from these old problems. Specifically:
- we cannot transfer fsmo roles between DCs due to LDAP error 50 
infufficient access rights
- have have high cpu usage acress the DCs, combined with continuous 
"highwatermark" errors on the same DC
- occasionally (2, 3 times a week) the DCs lockup, get stuck

Having said that, I think I found a way out, but would appreciate some 
feedback from the experts here.

In an isolated test setup, I started a clone of DC2/DC3/DC4, verified 
that replication is working correctly, ldapcmp as well, etc. Then I 
added a new DC5. DC2 (fsmo roles owner) did not pick it up at all, DC3 
picked it up with WERR_DS_DRA_ACCESS_DENIED, and only DC4 picked it up 
nicely. So, rolled back, shutdown DC2, and seized fsmo roles on DC4, and 
added a new samba 4.7 DC5. DC4 picked it up nicely again.

DC3 still WERR_DS_DRA_ACCESS_DENIED, so I shutdown DC3 as well, and 
focussed on just DC4 (samba 4.5.15) and DC5 (samba 4.7). In my isolated 
test setup this seems to work nicely: I could logon to a domain member 
server, a regular win7 workstation logon works, ADUC, Ms DNS manager 
works, etc, etc. Replication works, ldapcmp confirms, so this looks 
quite good. DNS is correctly updated to the new situation.

However, I have some questions I'd like to ask, before proceeding.

GPO - I think I have to take idmap.ldb from the old DC4, copy it to DC5, 
setup SysVol rsync to DC5 as well, restart samba, and do samba-tool 
ntacl sysvolreset ONCE, and never though it again, right?
(asking because the DC4 was NOT our old fsmo roles owner, and 'primary 
GPO DC')

- Can I re-use the old dns/ip for DC1 / DC2 and DC3? (I ran samba-tool 
domain demote --remove-other-dead-server=DC1/DC2/DC3 on both remaining 
DCs) Is this safe to do?

Also, upgrading the remaining samba 4.5.15 DC4 to samba 4.7 causes 
showrepl to become EXTREMELY slow on that DC.

After upgrading to 4.7, showrepl still works on DC5, also ADUC works to 
both, on DC4 ldapcmp still works quickly, only samba-tool drs showrepl 
on the upgraded 4.7 DC4 becomes slooow (10, 15 minutes))

A level 10 debug logs tells me that it waits *MANY* minutes after:
kinit for DC4$@SAMBA.DOMAIN.COM succeeded
and also many minutes after:
GSSAPI credentials for DC4$@SAMBA.DOMAIN.COM will expire in 35664 secs

In the end it does produce the expected output that replication is working.

I have a full -d10 log available if anyone would like to see it

If I cannot get the DC4 to upgrade to 4.7, I could of course also expire 
that one TOO, and proceed with only a new DC5. But it would be nicer to 
keep the DC4.

So, all in all this has taken up a lot of my time lately, I am very 
happy that my production environment dc2/dc3/dc4 is still running, even 
if with the occasional lockup...

Anyway, all feedback is welcome, including tips, suggestions, different 
approaches, etc, etc. This is all done just in a test environment...

Please, suggestions? More info?

MJ

-- 
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba


More information about the samba mailing list