[Samba] Samba 4 AD - Samba Fails to Start, hdb_samba4_create_kdc (setup KDC database) failed

JS it at cliffbells.com
Sun Jan 3 06:00:59 UTC 2016


 <=?windows-1252?Q?L.P.H._van_Belle?=> writes:

> 
> Ok, 
> 
> 

Hi Louis,

Thank you again for taking the time to help me out, I do appreciate it, and
I hope you had a safe and Happy New Year's eve.  I'm going to work my way
through the questions/comments in your response from top to bottom:

> First things is see. 
> 
> NTP
> drwxr-x---   2 root root         4096 Dec 28 21:12 ntp_signd 
> should be root:ntp 

No idea why the ownership is incorrect for that directory but I have
executed the following to fix it:

sudo chown -R root:ntp /var/lib/samba/ntp_signd

and now the security settings on that dir look like:

sudo ls -la /var/lib/samba/ntp_signd/
total 8
drwxr-x--- 2 root ntp  4096 Dec 28 21:12 .
drwxr-xr-x 8 root root 4096 Dec 13 21:07 ..
srwxrwxrwx 1 root ntp     0 Dec 28 21:12 socket


> SYVOL
> drwxrwx---+  3 root BUILTIN\administrators    4096 Apr 28  2015 sysvol
> your shows 300000 while mine gives : BUILTIN\administrators     
> but i have winbind/nsswitch etc configured on my DC, dont ask why, but i
need it, and it works good for me.  

Regarding the SYSVOL permissions, I checked the permissions of
/var/lib/samba/ on another PDC I have deployed on a different network and
ntp_signd is owned by root:3000000 as well.


> Can you tell more about the hardware failure? 
> Disk problems, power outage etc what exact happend? 
> Did you see an filesystem check the first time starting up after the failuere?

The initial hardware failure was a RAID array failure, I replaced the failed
devices and rebuilt the array and then rebuilt their domain from scratch
provisioning under a new domain.

> I asume its the only server, do no other DC's. 

Yes, that is correct, this machine is the only domain controller on this
network.

> Stop all samba processes and backup at least these folders. 
> /etc/samba
> /var/lib/samba
> /var/cache/samba

Samba fails at boot, I've already made a couple of safety backups but for
good measure I stopped smbd, nmbd, and samba services and backed up the
directories you listed.

> When you run :  samba-tool fsmo show
> You probely get an error...

I do receive an error, note I did not start any of the aforementioned
services prior to executing the samba-tool command below:

sudo samba-tool fsmo show
ldb_wrap open of secrets.ldb
ERROR(assert): uncaught exception
  File "/usr/lib/python2.7/dist-packages/samba/netcmd/__init__.py", line
175, in _run
    return self.run(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/samba/netcmd/fsmo.py", line 196, in run
    assert len(res) == 1

> , so try the following. 
> samba-tool fsmo sieze 

I receive a second error when executing the seize command:

sudo samba-tool fsmo seize
ldb_wrap open of secrets.ldb
ERROR: Invalid FSMO role.


> ( i dont think i will work, but give it a try, any outputs is most welkom  ) 
> 
> These do worry me. 
> Failed to find object DC=one,DC=cliffbells,DC=com for attribute
fsmoRoleOwner - Cannot find DN
> DC=one,DC=cliffbells,DC=com to get attribute fsmoRoleOwner for reference
dn: (null)
> 
> ./source4/dsdb/common/util.c:1877(samdb_is_pdc)
>   Failed to find if we are the PDC for this ldb: Searching for
fSMORoleOwner in DC=one,DC=cliffbells,DC=com
> failed: Cannot find DN DC=one,DC=cliffbells,DC=com to get attribute
fsmoRoleOwner for reference
> dn: (null)
> 
> which looks like you samba DB is corrected, probely due to the hardware
failure. 

If your hunch that the database is corrupt holds true it couldn't be from
hardware failure as this domain was provisioned after that incident.  I do
believe I may have traced where any possible corruption might have
originated though...  I (apparently foolishly) started backing up
/var/lib/samba with CrashPlan after the hardware failure incident... I'm
guessing that was a bad idea.

> Do you have a backup, made with samba_backup ? 
> ( shown here : 
https://wiki.samba.org/index.php/Backup_and_restore_an_Samba_AD_DC  )
> 
> Because i think you db is corrected and beyond recovery. 

No, I do not have that backup mechanism implemented, and from reading that
wiki page's notes about backing up live databases I have come to the
conclusion that CrashPlan backed up /var/lib/samba/ while the databases were
live and irreparably damaged them.  I don't know what the relationship
between /var/lib/samba/ and /var/cache/samba/ is exactly, but I assume that
any backup I had created via CrashPlan (if it had worked instead of wreaking
havoc) probably wouldn't have been valid lacking the /var/cache/samba/
directory contents... I will be implementing the Samba backup script from
your wiki link immediately on the other Samba ADCs I have deployed and will
utilize it here when I've rebuilt the domain, using CrashPlan for offsite
storage of archives it creates.

Which leads us your closing statement:

> If you have  backupped : 
> /etc/samba
> /var/lib/samba
> /var/cache/samba
> 
> You can remove the content of 
> /var/lib/samba
> /var/cache/samba
> 
> And reprovision, bases on the posts here and the things i see. 
> If you have a backup "any" which have also the samba databases, thats the
first you can try. 
> 
> Greetz, 
> 
> Louis


Other than the python error I received after running samba-tool fsmo show, I
believe I've built a pretty solid case for poor backup strategy being the
cause of this failure, and that reprovisioning the domain is my only course
of action at this time.  If you believe I'm getting ahead of myself, or if
you think that Python error could lead to another failure after I've
reprovisioned, please let me know.  I intend to execute the new domain
provisioning tomorrow (Sunday Jan 03 2016) in the late afternoon/early
evening (EST), and would hate to go through the process of rebuilding their
infrastructure only to have a Python issue trash the domain again.


Thanks again Louis et al for helping me troubleshoot this issue, I'm still
green when it comes to Samba.

Kind Regards,

JS




More information about the samba mailing list