Restore scenarios with samba4
Hodozsó Barnabás
barnabas.hodozso at novin.hu
Tue Feb 14 16:33:57 UTC 2017
Hello Samba Support!
We have some questions regarding restoring of samba AD DCs.
*The goal:*
- there are two servers: one in the office (DC1), and one in the cloud (DC2)
- the servers are copnnected via VPN
- the role of DC1: to serve domain logon and and to manage the
users/policies via RSAT on a windows domain member
- the role of DC2: to provide an LDAP to other applications (ftp, imap,
smtp, sogo, ...)
In order to use the users created and managed in DC1, the following
*config *was realised:
- in DC1 was provisioned the samba
- in DC2 also a samba was installed and joined as ("secondary") DC to
the DC1 domain
- so as result there is a domain system with two DCs (in order to keep
the ADs synchronized between DC1 and DC2)
- samba 4.5.3 is used on Debian Jessie on both servers
- we use the samba internal DNS
Everything is working fine, but before we go in production, *we wanted
to test the system from backup and restore point of view*.
We create regular backups based on script provided in source.
The now we tried to test *3 scenarios*:
1. the DC2 is broken
2. the DC1 is broken
3. everything is broken, an old state must be restored
Also the main target of our restore in production will be (probably) to
restore an previous state of AD (e.g. something was deleted, or
misconfigured, or ...).
Our tests based on these two guides:
https://wiki.samba.org/index.php/Back_up_and_Restoring_a_Samba_AD_DC#Restore
https://wiki.samba.org/index.php/Transferring_and_Seizing_FSMO_Roles#How_to_Handle_Situations_Where_a_DC_with_FSMO_Roles_Is_Offline
*The result:*
1. where all the FSMO roles were owned by DC1, which was okay: it is fine:
- we stopped the samba and deleted the whole /usr/local/samba
- thereafter we installed it again and joined it to the DC1
The sync was okay after restore.
*2. unfortunetly this scenario was failed.*
(
We have to note, that also after the fresh installation the transferring
of FSMO roles could not be executed successfully, the following two
roles could not be swtiched:
DomainDnsZonesMasterRole
ForestDnsZonesMasterRole
But it is not problem to us, we want all the roles always only to DC1.
)
Keep in mind, that in this scenario the FSMO roles were assigned to the
broken DC1.
We stopped the samba, and deleted the whole /usr/local/samba.
Thereafter Based on the guides we tried to join the DC2 again after a
fresh installation.
But in this case there were errors:
- note: the _msdcs CNAME record had to be added again (it is not
problem, in scenario 1 it was also the case)
- after start of samba on DC2, the following errors occured in log:
Feb 14 14:54:15 DC1 samba[1999]: [2017/02/14 14:54:15.489221, 0]
../source4/lib/tls/tlscert.c:167(tls_cert_generate)
Feb 14 14:54:15 DC1 samba[1999]: TLS self-signed keys generated OK
Feb 14 14:54:30 DC1 samba[2002]: [2017/02/14 14:54:30.140445, 0]
../source4/librpc/rpc/dcerpc_util.c:745(dcerpc_pipe_auth_recv)
Feb 14 14:54:30 DC1 samba[2002]: Failed to bind to uuid
e3514235-4b06-11d1-ab04-00c04fc2dcd2 for
ncacn_ip_tcp:192.168.0.251[1024,seal,krb5,target_hostname=4e1fa04b-18b0-43e1-82c7-e314d2e5197e._msdcs.example.com,abstract_syntax=e3514235-4b06-11d1-ab04-00c04fc2dcd2/0x00000004,localaddress=192.168.0.251]
NT_STATUS_INVALID_PARAMETER
Feb 14 14:54:30 DC1 samba[2002]: [2017/02/14 14:54:30.144239, 0]
../source4/dsdb/repl/drepl_ridalloc.c:43(drepl_new_rid_pool_callback)
Feb 14 14:54:30 DC1 samba[2002]:
../source4/dsdb/repl/drepl_ridalloc.c:43: RID Manager failed RID
allocation - WERR_INVALID_PARAM - extended_ret[0x0]
- after samba stop and start the errors disappeared
- but on the DC2 (wich was untouched) the following error occured again
and again:
Failed to bind to uuid e3514235-4b06-11d1-ab04-00c04fc2dcd2 for
ncacn_ip_tcp:192.168.0.251[1024,seal,krb5,target_hostname=c5bfb8c8-f949-4eb9-9a92-5eac84dc73f8._msdcs.example.com,target_principal=GC/DC1.example.com/example.com,abstract_syntax=e3514235-4b06-11d1-ab04-00c04fc2dcd2/0x00000004,localaddress=192.168.159.1]
NT_STATUS_UNSUCCESSFUL
I think the "ID" e3514235-4b06-11d1-ab04-00c04fc2dcd2 was the ID of the
broken DC1, because after the rejoining this "ID" changed.
- also the drs showrepl showed errors on DC2 (but it was successfull on
DC1):
DC=ForestDnsZones,DC=example,DC=com
Default-First-Site-Name\DC1 via RPC
DSA object GUID: c5bfb8c8-f949-4eb9-9a92-5eac84dc73f8
Last attempt @ Tue Feb 14 14:59:03 2017 CET failed, result 31
(WERR_GENERAL_FAILURE)
31 consecutive failure(s).
Last success @ NTTIME(0)
- on DC1 was also (maybe) other error:
[2017/02/14 14:56:32.542230, 0]
../source4/librpc/rpc/dcerpc_util.c:745(dcerpc_pipe_auth_recv)
Failed to bind to uuid e3514235-4b06-11d1-ab04-00c04fc2dcd2 for
ncacn_ip_tcp:192.168.0.251[1024,seal,krb5,target_hostname=4e1fa04b-18b0-43e1-82c7-
e314d2e5197e._msdcs.example.com,abstract_syntax=e3514235-4b06-11d1-ab04-00c04fc2dcd2/0x00000004,localaddress=192.168.0.251]
NT_STATUS_INVALID_PARAMETER
[2017/02/14 14:56:32.542996, 0]
../source4/dsdb/repl/drepl_ridalloc.c:43(drepl_new_rid_pool_callback)
../source4/dsdb/repl/drepl_ridalloc.c:43: RID Manager failed RID
allocation - WERR_INVALID_PARAM - extended_ret[0x0]
- in two ADs was also diff (ldapcmp ldap://DC2 ldap://DC1
-Uadministrator --filter=cn,CN,dc,DC)
* Comparing [DOMAIN] context...
* Objects to be compared: 284
Comparing:
'CN=DC1,OU=Domain Controllers,DC=ad,DC=tndtech,DC=hu' [ldap://PINGVIN]
'CN=DC1,OU=Domain Controllers,DC=ad,DC=tndtech,DC=hu' [ldap://DC1]
Difference in attribute values:
servicePrincipalName =>
['E3514235-4B06-11D1-AB04-00C04FC2DCD2/c5bfb8c8-f949-4eb9-9a92-5eac84dc73f8/example.com',
'GC/DC1.example.com/example.com', 'HOST/DC1', 'HOST/DC1.example.com']
['E3514235-4B06-11D1-AB04-00C04FC2DCD2/c5bfb8c8-f949-4eb9-9a92-5eac84dc73f8/example.com',
'GC/DC1.example.com/example.com', 'HOST/DC1', 'HOST/DC1.example.com',
'HOST/DC1.example.com/TND', 'HOST/DC1.example.com/example.com',
'RestrictedKrbHost/DC1', 'RestrictedKrbHost/DC1.example.com',
'ldap/DC1',
'ldap/c5bfb8c8-f949-4eb9-9a92-5eac84dc73f8._msdcs.example.com',
'ldap/DC1.example.com',
'ldap/DC1.example.com/DomainDnsZones.example.com',
'ldap/DC1.example.com/ForestDnsZones.example.com',
'ldap/DC1.example.com/TND', 'ldap/DC1.example.com/example.com']
FAILED
* Result for [DOMAIN]: FAILURE
Attributes with different values:
servicePrincipalName
* Comparing [CONFIGURATION] context...
* DN lists have different size: 1614 != 1615
CN=233b81f5-e09b-478c-bce6-b2a84985b011,CN=NTDS
Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=com
* Objects to be compared: 1614
Comparing:
'CN=NTDS Site
Settings,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=com'
[ldap://DC2]
'CN=NTDS Site
Settings,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=com'
[ldap://DC1]
Difference in attribute values:
interSiteTopologyGenerator =>
['CN=NTDS
Settings,CN=DC2,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=com']
['CN=NTDS
Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=com']
FAILED
* Result for [CONFIGURATION]: FAILURE
- also the FSMO roles pointed to the deleted DC, eg.:
SchemaMasterRole owner: CN=NTDS
Settings\0ADEL:4e1fa04b-18b0-43e1-82c7-e314d2e5197e,CN=DC1\0ADEL:5b636acc-d166-4cf1-875e-003c654db7de,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=example,DC=com
- a replicate manually from DC2 to DC1:
samba-tool drs replicate DC1 DC2 <all>
- thereafter a I seized all roles to the DC2:
samba-tool fsmo seize --role=all
Attempting transfer...
Transfer unsuccessful, seizing...
Seizing rid FSMO role...
FSMO seize of 'rid' role successful
Attempting transfer...
Transfer unsuccessful, seizing...
Seizing pdc FSMO role...
FSMO seize of 'pdc' role successful
Attempting transfer...
Transfer unsuccessful, seizing...
Seizing naming FSMO role...
FSMO seize of 'naming' role successful
Attempting transfer...
Transfer unsuccessful, seizing...
Seizing infrastructure FSMO role...
FSMO seize of 'infrastructure' role successful
Attempting transfer...
Transfer unsuccessful, seizing...
Seizing schema FSMO role...
FSMO seize of 'schema' role successful
Attempting transfer...
Failed to connect to ldap URL
'ldap://4e1fa04b-18b0-43e1-82c7-e314d2e5197e._msdcs.example.com' - LDAP
client internal error: NT_STATUS_OBJECT_NAME_NOT_FOUND
Failed to connect to
'ldap://4e1fa04b-18b0-43e1-82c7-e314d2e5197e._msdcs.example.com' with
backend 'ldap': (null)
Transfer unsuccessful, seizing...
Seizing domaindns FSMO role...
FSMO seize of 'domaindns' role successful
Attempting transfer...
Failed to connect to ldap URL
'ldap://4e1fa04b-18b0-43e1-82c7-e314d2e5197e._msdcs.example.com' - LDAP
client internal error: NT_STATUS_OBJECT_NAME_NOT_FOUND
Failed to connect to
'ldap://4e1fa04b-18b0-43e1-82c7-e314d2e5197e._msdcs.example.com' with
backend 'ldap': (null)
Transfer unsuccessful, seizing...
Seizing forestdns FSMO role...
FSMO seize of 'forestdns' role successful
- but thereafter I could not mange the users via RSAT, so I had to seize
them to DC1:
samba-tool fsmo seize --role=all
Attempting transfer...
This DC already has the 'rid' FSMO role
Transfer successful, not seizing role
Attempting transfer...
This DC already has the 'pdc' FSMO role
Transfer successful, not seizing role
Attempting transfer...
This DC already has the 'naming' FSMO role
Transfer successful, not seizing role
Attempting transfer...
This DC already has the 'infrastructure' FSMO role
Transfer successful, not seizing role
Attempting transfer...
This DC already has the 'schema' FSMO role
Transfer successful, not seizing role
Attempting transfer...
Failed to connect to ldap URL
'ldap://5bc1b8bb-a74a-4c4d-80bc-b3ba6fbb2b52._msdcs.example.com' - LDAP
client internal error: NT_STATUS_OBJECT_NAME_NOT_FOUND
Failed to connect to
'ldap://5bc1b8bb-a74a-4c4d-80bc-b3ba6fbb2b52._msdcs.example.com' with
backend 'ldap': (null)
Transfer unsuccessful, seizing...
Seizing domaindns FSMO role...
FSMO seize of 'domaindns' role successful
Attempting transfer...
Failed to connect to ldap URL
'ldap://5bc1b8bb-a74a-4c4d-80bc-b3ba6fbb2b52._msdcs.example.com' - LDAP
client internal error: NT_STATUS_OBJECT_NAME_NOT_FOUND
Failed to connect to
'ldap://5bc1b8bb-a74a-4c4d-80bc-b3ba6fbb2b52._msdcs.example.com' with
backend 'ldap': (null)
Transfer unsuccessful, seizing...
Seizing forestdns FSMO role...
FSMO seize of 'forestdns' role successful
- after a few samba restart on DC1 and DC2 the errors disappeared, and
now everything seems to be okay (also the ldapcmp)
But I think something I did wrong, but *what is the right process in
this case *(demote the old DC1 or seize of roles before restore of
DC1)*?* Also after seizing the roles to DC2 (and thereafter to DC1)
shouldn't be something error in replication (because the old role owners
are in domain further)?
3. in this case you write that we have to contact you.
To be honest I do not understand, *why we cannot do a restore to an old
state with the following steps**?*:
- stop samba on both DC1 and DC2
- restore the DC1 from backup to an previuos state
- start samba on DC1
- restore the samba on DC2:
- delete the whole /usr/local/samba
- fresh install
- join to DC1
Also in this case will the synchonization metadata wrong and the
replication will fail?
Is this not the same as the following normal use case?
- both DC1 and DC2 were stopped for several day
- DC1 was started, but DC2 not
- after several days the DC2 was also started
In this case will the synchornization also fail? Or why cannot executed
the restore process above if both DC1 and DC2 break? Maybe the
/usr/local/samba contains not all data (including metadata for sync)?
Thank you for your support in advance!
Best Regards,
Barnabás
More information about the samba-technical
mailing list