[Samba] samba AD database suspected corruption
lists at merit.unu.edu
Mon Oct 23 18:46:18 UTC 2017
Back in the samba 4.1 days, we experienced a samba database corruption:
tombstones not being deleted from sam.lbd, ultimately resulting in a
huge database, full root disk, samba crashing, we were completely down.
We asked the great guys at sernet to help, they did super work, and
managed to get us up and running again, including the addition of a
Currently on 4.5.15, we have some strange issues with our samba AD
setup, that I feel are remains from these old problems. Specifically:
- we cannot transfer fsmo roles between DCs due to LDAP error 50
infufficient access rights
- have have high cpu usage acress the DCs, combined with continuous
"highwatermark" errors on the same DC
- occasionally (2, 3 times a week) the DCs lockup, get stuck
Having said that, I think I found a way out, but would appreciate some
feedback from the experts here.
In an isolated test setup, I started a clone of DC2/DC3/DC4, verified
that replication is working correctly, ldapcmp as well, etc. Then I
added a new DC5. DC2 (fsmo roles owner) did not pick it up at all, DC3
picked it up with WERR_DS_DRA_ACCESS_DENIED, and only DC4 picked it up
nicely. So, rolled back, shutdown DC2, and seized fsmo roles on DC4, and
added a new samba 4.7 DC5. DC4 picked it up nicely again.
DC3 still WERR_DS_DRA_ACCESS_DENIED, so I shutdown DC3 as well, and
focussed on just DC4 (samba 4.5.15) and DC5 (samba 4.7). In my isolated
test setup this seems to work nicely: I could logon to a domain member
server, a regular win7 workstation logon works, ADUC, Ms DNS manager
works, etc, etc. Replication works, ldapcmp confirms, so this looks
quite good. DNS is correctly updated to the new situation.
However, I have some questions I'd like to ask, before proceeding.
GPO - I think I have to take idmap.ldb from the old DC4, copy it to DC5,
setup SysVol rsync to DC5 as well, restart samba, and do samba-tool
ntacl sysvolreset ONCE, and never though it again, right?
(asking because the DC4 was NOT our old fsmo roles owner, and 'primary
- Can I re-use the old dns/ip for DC1 / DC2 and DC3? (I ran samba-tool
domain demote --remove-other-dead-server=DC1/DC2/DC3 on both remaining
DCs) Is this safe to do?
Also, upgrading the remaining samba 4.5.15 DC4 to samba 4.7 causes
showrepl to become EXTREMELY slow on that DC.
After upgrading to 4.7, showrepl still works on DC5, also ADUC works to
both, on DC4 ldapcmp still works quickly, only samba-tool drs showrepl
on the upgraded 4.7 DC4 becomes slooow (10, 15 minutes))
A level 10 debug logs tells me that it waits *MANY* minutes after:
> kinit for DC4$@SAMBA.DOMAIN.COM succeeded
and also many minutes after:
> GSSAPI credentials for DC4$@SAMBA.DOMAIN.COM will expire in 35664 secs
In the end it does produce the expected output that replication is working.
I have a full -d10 log available if anyone would like to see it
If I cannot get the DC4 to upgrade to 4.7, I could of course also expire
that one TOO, and proceed with only a new DC5. But it would be nicer to
keep the DC4.
So, all in all this has taken up a lot of my time lately, I am very
happy that my production environment dc2/dc3/dc4 is still running, even
if with the occasional lockup...
Anyway, all feedback is welcome, including tips, suggestions, different
approaches, etc, etc. This is all done just in a test environment...
Please, suggestions? More info?
More information about the samba