Restore scenarios with samba4

Tue Feb 14 16:33:57 UTC 2017

Hello Samba Support!

We have some questions regarding restoring of samba AD DCs.

*The goal:*
- there are two servers: one in the office (DC1), and one in the cloud (DC2)
- the servers are copnnected via VPN
- the role of DC1: to serve domain logon and and to manage the 
users/policies via RSAT on a windows domain member
- the role of DC2: to provide an LDAP to other applications (ftp, imap, 
smtp, sogo, ...)

In order to use the users created and managed in DC1, the following 
*config *was realised:
- in DC1 was provisioned the samba
- in DC2 also a samba was installed and joined as ("secondary") DC to 
the DC1 domain
- so as result there is a domain system with two DCs (in order to keep 
the ADs synchronized between DC1 and DC2)
- samba 4.5.3 is used on Debian Jessie on both servers
- we use the samba internal DNS

Everything is working fine, but before we go in production, *we wanted 
to test the system from backup and restore point of view*.
We create regular backups based on script provided in source.

The now we tried to test *3 scenarios*:
1. the DC2 is broken
2. the DC1 is broken
3. everything is broken, an old state must be restored

Also the main target of our restore in production will be (probably) to 
restore an previous state of AD (e.g. something was deleted, or 
misconfigured, or ...).

Our tests based on these two guides:
https://wiki.samba.org/index.php/Back_up_and_Restoring_a_Samba_AD_DC#Restore
https://wiki.samba.org/index.php/Transferring_and_Seizing_FSMO_Roles#How_to_Handle_Situations_Where_a_DC_with_FSMO_Roles_Is_Offline

*The result:*

1. where all the FSMO roles were owned by DC1, which was okay: it is fine:
- we stopped the samba and deleted the whole /usr/local/samba
- thereafter we installed it again and joined it to the DC1

The sync was okay after restore.

*2. unfortunetly this scenario was failed.*

(
We have to note, that also after the fresh installation the transferring 
of FSMO roles could not be executed successfully, the following two 
roles could not be swtiched:
DomainDnsZonesMasterRole
ForestDnsZonesMasterRole

But it is not problem to us, we want all the roles always only to DC1.
)

Keep in mind, that in this scenario the FSMO roles were assigned to the 
broken DC1.

We stopped the samba, and deleted the whole /usr/local/samba.
Thereafter Based on the guides we tried to join the DC2 again after a 
fresh installation.

But in this case there were errors:

- note: the _msdcs CNAME record had to be added again (it is not 
problem, in scenario 1 it was also the case)

- after start of samba on DC2, the following errors occured in log:
Feb 14 14:54:15 DC1 samba[1999]: [2017/02/14 14:54:15.489221,  0] 
../source4/lib/tls/tlscert.c:167(tls_cert_generate)
Feb 14 14:54:15 DC1 samba[1999]:   TLS self-signed keys generated OK
Feb 14 14:54:30 DC1 samba[2002]: [2017/02/14 14:54:30.140445,  0] 
../source4/librpc/rpc/dcerpc_util.c:745(dcerpc_pipe_auth_recv)
Feb 14 14:54:30 DC1 samba[2002]:   Failed to bind to uuid 
e3514235-4b06-11d1-ab04-00c04fc2dcd2 for 
ncacn_ip_tcp:192.168.0.251[1024,seal,krb5,target_hostname=4e1fa04b-18b0-43e1-82c7-e314d2e5197e._msdcs.example.com,abstract_syntax=e3514235-4b06-11d1-ab04-00c04fc2dcd2/0x00000004,localaddress=192.168.0.251] 
NT_STATUS_INVALID_PARAMETER
Feb 14 14:54:30 DC1 samba[2002]: [2017/02/14 14:54:30.144239,  0] 
../source4/dsdb/repl/drepl_ridalloc.c:43(drepl_new_rid_pool_callback)
Feb 14 14:54:30 DC1 samba[2002]: 
../source4/dsdb/repl/drepl_ridalloc.c:43: RID Manager failed RID 
allocation - WERR_INVALID_PARAM - extended_ret[0x0]

- after samba stop and start the errors disappeared

- but on the DC2 (wich was untouched) the following error occured again 
and again:
   Failed to bind to uuid e3514235-4b06-11d1-ab04-00c04fc2dcd2 for 
ncacn_ip_tcp:192.168.0.251[1024,seal,krb5,target_hostname=c5bfb8c8-f949-4eb9-9a92-5eac84dc73f8._msdcs.example.com,target_principal=GC/DC1.example.com/example.com,abstract_syntax=e3514235-4b06-11d1-ab04-00c04fc2dcd2/0x00000004,localaddress=192.168.159.1] 

NT_STATUS_UNSUCCESSFUL

I think the "ID" e3514235-4b06-11d1-ab04-00c04fc2dcd2 was the ID of the 
broken DC1, because after the rejoining this "ID" changed.

- also the drs showrepl showed errors on DC2 (but it was successfull on 
DC1):
DC=ForestDnsZones,DC=example,DC=com
     Default-First-Site-Name\DC1 via RPC
         DSA object GUID: c5bfb8c8-f949-4eb9-9a92-5eac84dc73f8
         Last attempt @ Tue Feb 14 14:59:03 2017 CET failed, result 31 
(WERR_GENERAL_FAILURE)
         31 consecutive failure(s).
         Last success @ NTTIME(0)

- on DC1 was also (maybe) other error:
[2017/02/14 14:56:32.542230,  0] 
../source4/librpc/rpc/dcerpc_util.c:745(dcerpc_pipe_auth_recv)
   Failed to bind to uuid e3514235-4b06-11d1-ab04-00c04fc2dcd2 for 
ncacn_ip_tcp:192.168.0.251[1024,seal,krb5,target_hostname=4e1fa04b-18b0-43e1-82c7-

e314d2e5197e._msdcs.example.com,abstract_syntax=e3514235-4b06-11d1-ab04-00c04fc2dcd2/0x00000004,localaddress=192.168.0.251] 
NT_STATUS_INVALID_PARAMETER
[2017/02/14 14:56:32.542996,  0] 
../source4/dsdb/repl/drepl_ridalloc.c:43(drepl_new_rid_pool_callback)
   ../source4/dsdb/repl/drepl_ridalloc.c:43: RID Manager failed RID 
allocation - WERR_INVALID_PARAM - extended_ret[0x0]

- in two ADs was also diff (ldapcmp ldap://DC2 ldap://DC1 
-Uadministrator --filter=cn,CN,dc,DC)

* Comparing [DOMAIN] context...

* Objects to be compared: 284

Comparing:
'CN=DC1,OU=Domain Controllers,DC=ad,DC=tndtech,DC=hu' [ldap://PINGVIN]
'CN=DC1,OU=Domain Controllers,DC=ad,DC=tndtech,DC=hu' [ldap://DC1]
     Difference in attribute values:
         servicePrincipalName =>
['E3514235-4B06-11D1-AB04-00C04FC2DCD2/c5bfb8c8-f949-4eb9-9a92-5eac84dc73f8/example.com', 
'GC/DC1.example.com/example.com', 'HOST/DC1', 'HOST/DC1.example.com']
['E3514235-4B06-11D1-AB04-00C04FC2DCD2/c5bfb8c8-f949-4eb9-9a92-5eac84dc73f8/example.com', 
'GC/DC1.example.com/example.com', 'HOST/DC1', 'HOST/DC1.example.com', 
'HOST/DC1.example.com/TND', 'HOST/DC1.example.com/example.com', 
'RestrictedKrbHost/DC1', 'RestrictedKrbHost/DC1.example.com', 
'ldap/DC1', 
'ldap/c5bfb8c8-f949-4eb9-9a92-5eac84dc73f8._msdcs.example.com', 
'ldap/DC1.example.com', 
'ldap/DC1.example.com/DomainDnsZones.example.com', 
'ldap/DC1.example.com/ForestDnsZones.example.com', 
'ldap/DC1.example.com/TND', 'ldap/DC1.example.com/example.com']
     FAILED

* Result for [DOMAIN]: FAILURE

Attributes with different values:

     servicePrincipalName

* Comparing [CONFIGURATION] context...

* DN lists have different size: 1614 != 1615
     CN=233b81f5-e09b-478c-bce6-b2a84985b011,CN=NTDS 
Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=com

* Objects to be compared: 1614

Comparing:
'CN=NTDS Site 
Settings,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=com' 
[ldap://DC2]
'CN=NTDS Site 
Settings,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=com' 
[ldap://DC1]
     Difference in attribute values:
         interSiteTopologyGenerator =>
['CN=NTDS 
Settings,CN=DC2,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=com']
['CN=NTDS 
Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=com']
     FAILED

* Result for [CONFIGURATION]: FAILURE

- also the FSMO roles pointed to the deleted DC, eg.:
SchemaMasterRole owner: CN=NTDS 
Settings\0ADEL:4e1fa04b-18b0-43e1-82c7-e314d2e5197e,CN=DC1\0ADEL:5b636acc-d166-4cf1-875e-003c654db7de,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=com

- a replicate manually from DC2 to DC1:
samba-tool drs replicate DC1 DC2 <all>

- thereafter a I seized all roles to the DC2:
samba-tool fsmo seize --role=all
Attempting transfer...
Transfer unsuccessful, seizing...
Seizing rid FSMO role...
FSMO seize of 'rid' role successful
Attempting transfer...
Transfer unsuccessful, seizing...
Seizing pdc FSMO role...
FSMO seize of 'pdc' role successful
Attempting transfer...
Transfer unsuccessful, seizing...
Seizing naming FSMO role...
FSMO seize of 'naming' role successful
Attempting transfer...
Transfer unsuccessful, seizing...
Seizing infrastructure FSMO role...
FSMO seize of 'infrastructure' role successful
Attempting transfer...
Transfer unsuccessful, seizing...
Seizing schema FSMO role...
FSMO seize of 'schema' role successful
Attempting transfer...
Failed to connect to ldap URL 
'ldap://4e1fa04b-18b0-43e1-82c7-e314d2e5197e._msdcs.example.com' - LDAP 
client internal error: NT_STATUS_OBJECT_NAME_NOT_FOUND
Failed to connect to 
'ldap://4e1fa04b-18b0-43e1-82c7-e314d2e5197e._msdcs.example.com' with 
backend 'ldap': (null)
Transfer unsuccessful, seizing...
Seizing domaindns FSMO role...
FSMO seize of 'domaindns' role successful
Attempting transfer...
Failed to connect to ldap URL 
'ldap://4e1fa04b-18b0-43e1-82c7-e314d2e5197e._msdcs.example.com' - LDAP 
client internal error: NT_STATUS_OBJECT_NAME_NOT_FOUND
Failed to connect to 
'ldap://4e1fa04b-18b0-43e1-82c7-e314d2e5197e._msdcs.example.com' with 
backend 'ldap': (null)
Transfer unsuccessful, seizing...
Seizing forestdns FSMO role...
FSMO seize of 'forestdns' role successful

- but thereafter I could not mange the users via RSAT, so I had to seize 
them to DC1:
samba-tool fsmo seize --role=all
Attempting transfer...
This DC already has the 'rid' FSMO role
Transfer successful, not seizing role
Attempting transfer...
This DC already has the 'pdc' FSMO role
Transfer successful, not seizing role
Attempting transfer...
This DC already has the 'naming' FSMO role
Transfer successful, not seizing role
Attempting transfer...
This DC already has the 'infrastructure' FSMO role
Transfer successful, not seizing role
Attempting transfer...
This DC already has the 'schema' FSMO role
Transfer successful, not seizing role
Attempting transfer...
Failed to connect to ldap URL 
'ldap://5bc1b8bb-a74a-4c4d-80bc-b3ba6fbb2b52._msdcs.example.com' - LDAP 
client internal error: NT_STATUS_OBJECT_NAME_NOT_FOUND
Failed to connect to 
'ldap://5bc1b8bb-a74a-4c4d-80bc-b3ba6fbb2b52._msdcs.example.com' with 
backend 'ldap': (null)
Transfer unsuccessful, seizing...
Seizing domaindns FSMO role...
FSMO seize of 'domaindns' role successful
Attempting transfer...
Failed to connect to ldap URL 
'ldap://5bc1b8bb-a74a-4c4d-80bc-b3ba6fbb2b52._msdcs.example.com' - LDAP 
client internal error: NT_STATUS_OBJECT_NAME_NOT_FOUND
Failed to connect to 
'ldap://5bc1b8bb-a74a-4c4d-80bc-b3ba6fbb2b52._msdcs.example.com' with 
backend 'ldap': (null)
Transfer unsuccessful, seizing...
Seizing forestdns FSMO role...
FSMO seize of 'forestdns' role successful

- after a few samba restart on DC1 and DC2 the errors disappeared, and 
now everything seems to be okay (also the ldapcmp)

But I think something I did wrong, but *what is the right process in 
this case *(demote the old DC1 or seize of roles before restore of 
DC1)*?* Also after seizing the roles to DC2 (and thereafter to DC1) 
shouldn't be something error in replication (because the old role owners 
are in domain further)?

3. in this case you write that we have to contact you.

To be honest I do not understand, *why we cannot do a restore to an old 
state with the following steps**?*:
- stop samba on both DC1 and DC2
- restore the DC1 from backup to an previuos state
- start samba on DC1
- restore the samba on DC2:
   - delete the whole /usr/local/samba
   - fresh install
   - join to DC1

Also in this case will the synchonization metadata wrong and the 
replication will fail?

Is this not the same as the following normal use case?
- both DC1 and DC2 were stopped for several day
- DC1 was started, but DC2 not
- after several days the DC2 was also started

In this case will the synchornization also fail? Or why cannot executed 
the restore process above if both DC1 and DC2 break? Maybe the 
/usr/local/samba contains not all data (including metadata for sync)?

Thank you for your support in advance!

Best Regards,
Barnabás