[PATCH] Add DRS_GET_TGT support for client-side replication

Tim Beale timbeale at catalyst.net.nz
Thu Jul 27 05:43:56 UTC 2017


Hi,

The attached changes add GET_TGT support for client replication.

It might be easier to view the changes via git:

http://git.catalyst.net.nz/gw?p=samba.git;a=shortlog;h=refs/heads/tim-tgt-client

Please review and provide feedback, and I will happily make changes.

Note that it might make sense to squash some of these commits together.
I've left them as is for now to show my thought process involved.

Thanks,
Tim
-------------- next part --------------
From 4533df1e33742bfe333b00360d3f6bf9cbd897bc Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Tue, 30 May 2017 15:08:44 +1200
Subject: [PATCH 01/11] werror: Add WERR_DS_DRA_RECYCLED_TARGET

When the DRS client encounters a linked attribute with an unknown target
object, it should return a RECYCLED_TARGET error, which should result in
the client resending the GETNCChanges request with the GET_TGT flag set.

This error code is currently documented by Microsoft under System Error
Codes (8200-8999). I contacted them and they will also add it to the
MS-ERREF doc in future.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 libcli/util/werror.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libcli/util/werror.h b/libcli/util/werror.h
index d3d3327..0370a06 100644
--- a/libcli/util/werror.h
+++ b/libcli/util/werror.h
@@ -100,6 +100,7 @@ typedef uint32_t WERROR;
 #define WERR_INVALID_PRIMARY_GROUP      W_ERROR(0x0000051C)
 
 #define WERR_DS_DRA_SECRETS_DENIED			W_ERROR(0x000021B6)
+#define WERR_DS_DRA_RECYCLED_TARGET			W_ERROR(0x000021BF)
 
 #define WERR_DNS_ERROR_KEYMASTER_REQUIRED               W_ERROR(0x0000238D)
 #define WERR_DNS_ERROR_NOT_ALLOWED_ON_SIGNED_ZONE       W_ERROR(0x0000238E)
-- 
2.7.4


From fd8beb88d8ded508ded804c68e4f4c8d8c7afabd Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Wed, 14 Jun 2017 10:51:59 +1200
Subject: [PATCH 02/11] replmd: Split checking link attr target into new
 function

We want to re-use this code to check that the linked attribute's target
object exists *before* we try to commit the transaction. This will allow
us to re-request the block with the GET_TGT flag set.

This splits checking the target object exists into a separate function.

Minor changes of note:
- the 'parent' argument was passed to replmd_process_linked_attribute()
  as NULL, so I've just replaced where it was used in the refactored code
  with NULL.
- I've tweaked the "Failed to find GUID" error message slightly to display
  the attribute ID rather than the attribute name (saves repeating
  lookups and/or passing extra arguments).
- Tweaked the replmd_deletion_state() logic - it only made sense to call
  it in the code block where we actually found the target

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 171 ++++++++++++++----------
 1 file changed, 102 insertions(+), 69 deletions(-)

diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index 0ed24cb..53e2c12 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -6627,6 +6627,105 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 	return replmd_replicated_apply_next(ar);
 }
 
+/**
+ * Checks that the target object for a linked attribute exists.
+ * If it exists, its GUID and deletion-state are returned
+ */
+static int replmd_check_target_exists(struct ldb_module *module,
+				      struct dsdb_dn *dsdb_dn,
+				      struct la_entry *la_entry,
+				      struct ldb_dn *source_dn,
+				      struct GUID *guid,
+				      enum deletion_state *target_deletion_state)
+{
+	struct drsuapi_DsReplicaLinkedAttribute *la = la_entry->la;
+	struct ldb_context *ldb = ldb_module_get_ctx(module);
+	struct ldb_result *target_res;
+	TALLOC_CTX *tmp_ctx = talloc_new(la_entry);
+	const char *attrs[] = { "isDeleted", "isRecycled", NULL };
+	NTSTATUS ntstatus;
+	int ret;
+	bool active = (la->flags & DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE) ? true : false;
+
+	*target_deletion_state = OBJECT_REMOVED;
+
+	ntstatus = dsdb_get_extended_dn_guid(dsdb_dn->dn, guid, "GUID");
+
+	if (!NT_STATUS_IS_OK(ntstatus) && !active) {
+
+		/*
+		 * This strange behaviour (allowing a NULL/missing
+		 * GUID) originally comes from:
+		 *
+		 * commit e3054ce0fe0f8f62d2f5b2a77893e7a1479128bd
+		 * Author: Andrew Tridgell <tridge at samba.org>
+		 * Date:   Mon Dec 21 21:21:55 2009 +1100
+		 *
+		 *  s4-drs: cope better with NULL GUIDS from DRS
+		 *
+		 *  It is valid to get a NULL GUID over DRS for a deleted forward link. We
+		 *  need to match by DN if possible when seeing if we should update an
+		 *  existing link.
+		 *
+		 *  Pair-Programmed-With: Andrew Bartlett <abartlet at samba.org>
+		 */
+		ret = dsdb_module_search_dn(module, tmp_ctx, &target_res,
+					    dsdb_dn->dn, attrs,
+					    DSDB_FLAG_NEXT_MODULE |
+					    DSDB_SEARCH_SHOW_RECYCLED |
+					    DSDB_SEARCH_SEARCH_ALL_PARTITIONS |
+					    DSDB_SEARCH_SHOW_DN_IN_STORAGE_FORMAT,
+					    NULL);
+	} else if (!NT_STATUS_IS_OK(ntstatus)) {
+		ldb_asprintf_errstring(ldb, "Failed to find GUID in linked attribute 0x%x blob for %s from %s",
+				       la->attid,
+				       ldb_dn_get_linearized(dsdb_dn->dn),
+				       ldb_dn_get_linearized(source_dn));
+		talloc_free(tmp_ctx);
+		return LDB_ERR_OPERATIONS_ERROR;
+	} else {
+		ret = dsdb_module_search(module, tmp_ctx, &target_res,
+					 NULL, LDB_SCOPE_SUBTREE,
+					 attrs,
+					 DSDB_FLAG_NEXT_MODULE |
+					 DSDB_SEARCH_SHOW_RECYCLED |
+					 DSDB_SEARCH_SEARCH_ALL_PARTITIONS |
+					 DSDB_SEARCH_SHOW_DN_IN_STORAGE_FORMAT,
+					 NULL,
+					 "objectGUID=%s",
+					 GUID_string(tmp_ctx, guid));
+	}
+
+	if (ret != LDB_SUCCESS) {
+		ldb_asprintf_errstring(ldb, "Failed to re-resolve GUID %s: %s\n",
+				       GUID_string(tmp_ctx, guid),
+				       ldb_errstring(ldb));
+		talloc_free(tmp_ctx);
+		return ret;
+	}
+
+	if (target_res->count == 0) {
+		DEBUG(2,(__location__ ": WARNING: Failed to re-resolve GUID %s - using %s\n",
+			 GUID_string(tmp_ctx, guid),
+			 ldb_dn_get_linearized(dsdb_dn->dn)));
+	} else if (target_res->count != 1) {
+		ldb_asprintf_errstring(ldb, "More than one object found matching objectGUID %s\n",
+				       GUID_string(tmp_ctx, guid));
+		ret = LDB_ERR_OPERATIONS_ERROR;
+	} else {
+		struct ldb_message *target_msg = target_res->msgs[0];
+
+		dsdb_dn->dn = talloc_steal(dsdb_dn, target_msg->dn);
+
+		/* Get the object's state (i.e. Not Deleted, Tombstone, etc) */
+		replmd_deletion_state(module, target_msg,
+				      target_deletion_state, NULL);
+	}
+
+	talloc_free(tmp_ctx);
+	return ret;
+}
+
 /*
   process one linked attribute structure
  */
@@ -6638,7 +6737,6 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	struct drsuapi_DsReplicaLinkedAttribute *la = la_entry->la;
 	struct ldb_context *ldb = ldb_module_get_ctx(module);
 	struct ldb_message *msg;
-	struct ldb_message *target_msg = NULL;
 	TALLOC_CTX *tmp_ctx = talloc_new(la_entry);
 	const struct dsdb_schema *schema = dsdb_get_schema(ldb, tmp_ctx);
 	int ret;
@@ -6649,17 +6747,15 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	WERROR status;
 	time_t t = time(NULL);
 	struct ldb_result *res;
-	struct ldb_result *target_res;
 	const char *attrs[4];
-	const char *attrs2[] = { "isDeleted", "isRecycled", NULL };
 	struct parsed_dn *pdn_list, *pdn, *next;
 	struct GUID guid = GUID_zero();
-	NTSTATUS ntstatus;
 	bool active = (la->flags & DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE)?true:false;
 
 	enum deletion_state deletion_state = OBJECT_NOT_DELETED;
 	enum deletion_state target_deletion_state = OBJECT_NOT_DELETED;
 
+
 /*
 linked_attributes[0]:
      &objs->linked_attributes[i]: struct drsuapi_DsReplicaLinkedAttribute
@@ -6779,74 +6875,14 @@ linked_attributes[0]:
 		return LDB_ERR_OPERATIONS_ERROR;
 	}
 
-	ntstatus = dsdb_get_extended_dn_guid(dsdb_dn->dn, &guid, "GUID");
-	if (!NT_STATUS_IS_OK(ntstatus) && !active) {
-		/*
-		 * This strange behaviour (allowing a NULL/missing
-		 * GUID) originally comes from:
-		 *
-		 * commit e3054ce0fe0f8f62d2f5b2a77893e7a1479128bd
-		 * Author: Andrew Tridgell <tridge at samba.org>
-		 * Date:   Mon Dec 21 21:21:55 2009 +1100
-		 *
-		 *  s4-drs: cope better with NULL GUIDS from DRS
-		 *
-		 *  It is valid to get a NULL GUID over DRS for a deleted forward link. We
-		 *  need to match by DN if possible when seeing if we should update an
-		 *  existing link.
-		 *
-		 *  Pair-Programmed-With: Andrew Bartlett <abartlet at samba.org>
-		 */
-
-		ret = dsdb_module_search_dn(module, tmp_ctx, &target_res,
-					    dsdb_dn->dn, attrs2,
-					    DSDB_FLAG_NEXT_MODULE |
-					    DSDB_SEARCH_SHOW_RECYCLED |
-					    DSDB_SEARCH_SEARCH_ALL_PARTITIONS |
-					    DSDB_SEARCH_SHOW_DN_IN_STORAGE_FORMAT,
-					    parent);
-	} else if (!NT_STATUS_IS_OK(ntstatus)) {
-		ldb_asprintf_errstring(ldb, "Failed to find GUID in linked attribute blob for %s on %s from %s",
-				       old_el->name,
-				       ldb_dn_get_linearized(dsdb_dn->dn),
-				       ldb_dn_get_linearized(msg->dn));
-		talloc_free(tmp_ctx);
-		return LDB_ERR_OPERATIONS_ERROR;
-	} else {
-		ret = dsdb_module_search(module, tmp_ctx, &target_res,
-					 NULL, LDB_SCOPE_SUBTREE,
-					 attrs2,
-					 DSDB_FLAG_NEXT_MODULE |
-					 DSDB_SEARCH_SHOW_RECYCLED |
-					 DSDB_SEARCH_SEARCH_ALL_PARTITIONS |
-					 DSDB_SEARCH_SHOW_DN_IN_STORAGE_FORMAT,
-					 parent,
-					 "objectGUID=%s",
-					 GUID_string(tmp_ctx, &guid));
-	}
+	ret = replmd_check_target_exists(module, dsdb_dn, la_entry, msg->dn,
+					 &guid, &target_deletion_state);
 
 	if (ret != LDB_SUCCESS) {
-		ldb_asprintf_errstring(ldb_module_get_ctx(module), "Failed to re-resolve GUID %s: %s\n",
-				       GUID_string(tmp_ctx, &guid),
-				       ldb_errstring(ldb_module_get_ctx(module)));
 		talloc_free(tmp_ctx);
 		return ret;
 	}
 
-	if (target_res->count == 0) {
-		DEBUG(2,(__location__ ": WARNING: Failed to re-resolve GUID %s - using %s\n",
-			 GUID_string(tmp_ctx, &guid),
-			 ldb_dn_get_linearized(dsdb_dn->dn)));
-	} else if (target_res->count != 1) {
-		ldb_asprintf_errstring(ldb_module_get_ctx(module), "More than one object found matching objectGUID %s\n",
-				       GUID_string(tmp_ctx, &guid));
-		talloc_free(tmp_ctx);
-		return LDB_ERR_OPERATIONS_ERROR;
-	} else {
-		target_msg = target_res->msgs[0];
-		dsdb_dn->dn = talloc_steal(dsdb_dn, target_msg->dn);
-	}
-
 	/*
 	 * Check for deleted objects per MS-DRSR 4.1.10.6.13
 	 * ProcessLinkValue, because link updates are not applied to
@@ -6854,9 +6890,6 @@ linked_attributes[0]:
 	 * any existing link, that should have happened when the
 	 * object deletion was replicated or initiated.
 	 */
-	replmd_deletion_state(module, target_msg,
-			      &target_deletion_state, NULL);
-
 	if (target_deletion_state >= OBJECT_RECYCLED) {
 		talloc_free(tmp_ctx);
 		return LDB_SUCCESS;
-- 
2.7.4


From 9fbbcd7331a364a450cc60dce6865875ced9e6f9 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Wed, 14 Jun 2017 11:35:36 +1200
Subject: [PATCH 03/11] drs: Fail replication transaction instead of dropping
 links

If the DRS client received a linked attribute that it couldn't resolve
the target for, then it would just ignore that link and keep going. That
link would then be lost forever (although a full-sync would resolve
this). Instead of silently ignoring the link, fail the transaction.

This *can* happen on Samba, but it is unusual. The target object and
linked-attribute would need to be added while a replication is still in
progress. It can also happen fairly easily when talking to a Windows DC.

There are two import exceptions to this:

1). Linked attributes that span partitions. We can never guarantee that
we will have received the target object, because it may be in a partition
we haven't replicated yet. Samba doesn't have a great way of handling
this currently, but we shouldn't fail the replication (because that breaks
basic join tests). Just skip that linked attribute and hope that a
subsequent full-sync will fix it.
(I queried Microsoft and they said resolving cross-partition linked
attributes is a implementation-specific problem to solve. GET_TGT won't
resolve it)

2). When the replication involves a subset of objects, e.g.
critical-only. In these cases, we don't increase the highwater-mark, so
it is probably not such a dire problem if we don't add the link. In the
case of critical-only, we will do a subsequent full sync which will then
add the links.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/repl/drepl_out_helpers.c           |  4 ++
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 92 ++++++++++++++++++++++++-
 source4/dsdb/samdb/samdb.h                      |  2 +-
 source4/libnet/libnet_vampire.c                 |  1 +
 4 files changed, 95 insertions(+), 4 deletions(-)

diff --git a/source4/dsdb/repl/drepl_out_helpers.c b/source4/dsdb/repl/drepl_out_helpers.c
index 079edc8..ad4801a 100644
--- a/source4/dsdb/repl/drepl_out_helpers.c
+++ b/source4/dsdb/repl/drepl_out_helpers.c
@@ -795,6 +795,10 @@ static void dreplsrv_op_pull_source_apply_changes_trigger(struct tevent_req *req
 	if (state->op->options & DRSUAPI_DRS_SPECIAL_SECRET_PROCESSING) {
 		dsdb_repl_flags |= DSDB_REPL_FLAG_EXPECT_NO_SECRETS;
 	}
+	if (state->op->options & DRSUAPI_DRS_CRITICAL_ONLY ||
+	    state->op->extended_op != DRSUAPI_EXOP_NONE) {
+		dsdb_repl_flags |= DSDB_REPL_FLAG_OBJECT_SUBSET;
+	}
 
 	if (state->op->extended_op != DRSUAPI_EXOP_NONE) {
 		ret = dsdb_find_nc_root(service->samdb, partition,
diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index 53e2c12..dc16674 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -74,6 +74,7 @@ struct replmd_private {
 struct la_entry {
 	struct la_entry *next, *prev;
 	struct drsuapi_DsReplicaLinkedAttribute *la;
+	bool incomplete_replica;
 };
 
 struct replmd_replicated_request {
@@ -6535,6 +6536,7 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 	struct ldb_control **ctrls;
 	int ret;
 	uint32_t i;
+	bool incomplete_subset;
 	struct replmd_private *replmd_private =
 		talloc_get_type(ldb_module_get_private(module), struct replmd_private);
 
@@ -6592,6 +6594,7 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 
 	ar->controls = req->controls;
 	req->controls = ctrls;
+	incomplete_subset = (ar->objs->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET);
 
 	DEBUG(4,("linked_attributes_count=%u\n", objs->linked_attributes_count));
 
@@ -6616,6 +6619,13 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 		}
 		*la_entry->la = ar->objs->linked_attributes[i];
 
+		/*
+		 * we may not be able to resolve link targets properly when
+		 * dealing with subsets of objects, e.g. the source is a
+		 * critical object and the target isn't
+		 */
+		la_entry->incomplete_replica = incomplete_subset;
+
 		/* we need to steal the non-scalars so they stay
 		   around until the end of the transaction */
 		talloc_steal(la_entry->la, la_entry->la->identifier);
@@ -6628,6 +6638,46 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 }
 
 /**
+ * Returns True if the source and target DNs both have the same naming context,
+ * i.e. they're both in the same partition.
+ */
+static bool replmd_objects_have_same_nc(struct ldb_context *ldb,
+					TALLOC_CTX *mem_ctx,
+					struct ldb_dn *source_dn,
+					struct ldb_dn *target_dn)
+{
+	TALLOC_CTX *tmp_ctx;
+	struct ldb_dn *source_nc;
+	struct ldb_dn *target_nc;
+	int ret;
+	bool same_nc = true;
+
+	tmp_ctx = talloc_new(mem_ctx);
+
+	ret = dsdb_find_nc_root(ldb, tmp_ctx, source_dn, &source_nc);
+	if (ret != LDB_SUCCESS) {
+		DBG_ERR("Failed to find base DN for source %s\n",
+			ldb_dn_get_linearized(source_dn));
+		talloc_free(tmp_ctx);
+		return true;
+	}
+
+	ret = dsdb_find_nc_root(ldb, tmp_ctx, target_dn, &target_nc);
+	if (ret != LDB_SUCCESS) {
+		DBG_ERR("Failed to find base DN for target %s\n",
+			ldb_dn_get_linearized(target_dn));
+		talloc_free(tmp_ctx);
+		return true;
+	}
+
+	same_nc = (ldb_dn_compare(source_nc, target_nc) == 0);
+
+	talloc_free(tmp_ctx);
+
+	return same_nc;
+}
+
+/**
  * Checks that the target object for a linked attribute exists.
  * If it exists, its GUID and deletion-state are returned
  */
@@ -6705,9 +6755,45 @@ static int replmd_check_target_exists(struct ldb_module *module,
 	}
 
 	if (target_res->count == 0) {
-		DEBUG(2,(__location__ ": WARNING: Failed to re-resolve GUID %s - using %s\n",
-			 GUID_string(tmp_ctx, guid),
-			 ldb_dn_get_linearized(dsdb_dn->dn)));
+
+		/*
+		 * TODO:
+		 * When we implement Trusted Domains we need to consider
+		 * whether they get treated as an incomplete replica here or not
+		 */
+		if (la_entry->incomplete_replica) {
+
+			/*
+			 * If we're only replicating a subset of objects (e.g.
+			 * critical-only, single-object), then an unknown target
+			 * is probably not a critical problem. We don't increase
+			 * the highwater-mark so subsequent replications should
+			 * resolve any missing links
+			 */
+			DEBUG(2,(__location__
+				 ": Failed to find target %s linked from %s\n",
+				 ldb_dn_get_linearized(dsdb_dn->dn),
+				 ldb_dn_get_linearized(source_dn)));
+
+		} else if (replmd_objects_have_same_nc(ldb, tmp_ctx, source_dn,
+						       dsdb_dn->dn)) {
+			ldb_asprintf_errstring(ldb, "Unknown target %s GUID %s linked from %s\n",
+					       ldb_dn_get_linearized(dsdb_dn->dn),
+					       GUID_string(tmp_ctx, guid),
+					       ldb_dn_get_linearized(source_dn));
+			ret = LDB_ERR_NO_SUCH_OBJECT;
+		} else {
+
+			/*
+			 * TODO:
+			 * We don't handle cross-partition links well here (we
+			 * could potentially lose them), but don't fail the
+			 * replication.
+			 */
+			DEBUG(2,("Failed to resolve cross-partition link between %s and %s\n",
+				 ldb_dn_get_linearized(source_dn),
+				 ldb_dn_get_linearized(dsdb_dn->dn)));
+		}
 	} else if (target_res->count != 1) {
 		ldb_asprintf_errstring(ldb, "More than one object found matching objectGUID %s\n",
 				       GUID_string(tmp_ctx, guid));
diff --git a/source4/dsdb/samdb/samdb.h b/source4/dsdb/samdb/samdb.h
index 5dce37e..7d0127b 100644
--- a/source4/dsdb/samdb/samdb.h
+++ b/source4/dsdb/samdb/samdb.h
@@ -64,7 +64,7 @@ struct dsdb_control_current_partition {
 #define DSDB_REPL_FLAG_PARTIAL_REPLICA     2
 #define DSDB_REPL_FLAG_ADD_NCNAME	   4
 #define DSDB_REPL_FLAG_EXPECT_NO_SECRETS   8
-
+#define DSDB_REPL_FLAG_OBJECT_SUBSET       16
 
 #define DSDB_CONTROL_REPLICATED_UPDATE_OID "1.3.6.1.4.1.7165.4.3.3"
 
diff --git a/source4/libnet/libnet_vampire.c b/source4/libnet/libnet_vampire.c
index 7f25a3a..d89256b 100644
--- a/source4/libnet/libnet_vampire.c
+++ b/source4/libnet/libnet_vampire.c
@@ -660,6 +660,7 @@ WERROR libnet_vampire_cb_store_chunk(void *private_data,
 		 */
 		ZERO_STRUCT(s_dsa->highwatermark);
 		uptodateness_vector = NULL;
+		dsdb_repl_flags |= DSDB_REPL_FLAG_OBJECT_SUBSET;
 	}
 
 	/* TODO: avoid hardcoded flags */
-- 
2.7.4


From 3cfd85358800fc2067b935b61fbae732761fc3ba Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Thu, 15 Jun 2017 14:09:27 +1200
Subject: [PATCH 04/11] drs: Check target object is known after applying
 objects

Currently we only check that the target object is known at the end of
the transaction (i.e. the .prepare_commit hook). It's too late at this
point to resend the request with GET_TGT. Move this processing earlier
on, after we've applied all the objects (i.e. off the .extended hook).

In reality, we need to perform the checks at both points. I've
split the common code that gets the source/target details out of the
la_entry into a helper function. It's not the greatest function ever,
but seemed to make more sense than duplicating the code.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/repl/replicated_objects.c          |  11 +-
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 178 ++++++++++++++++++------
 2 files changed, 145 insertions(+), 44 deletions(-)

diff --git a/source4/dsdb/repl/replicated_objects.c b/source4/dsdb/repl/replicated_objects.c
index 862fafa..dd84570 100644
--- a/source4/dsdb/repl/replicated_objects.c
+++ b/source4/dsdb/repl/replicated_objects.c
@@ -884,12 +884,15 @@ WERROR dsdb_replicated_objects_commit(struct ldb_context *ldb,
 			dsdb_reference_schema(ldb, cur_schema, false);
 		}
 
-		if (!W_ERROR_EQUAL(objects->error, WERR_DS_DRA_MISSING_PARENT)) {
-			DEBUG(1,("Failed to apply records: %s: %s\n",
-				 ldb_errstring(ldb), ldb_strerror(ret)));
-		} else {
+		if (W_ERROR_EQUAL(objects->error, WERR_DS_DRA_RECYCLED_TARGET)) {
+			DEBUG(3,("Missing target while attempting to apply records: %s\n",
+				 ldb_errstring(ldb)));
+		} else if (W_ERROR_EQUAL(objects->error, WERR_DS_DRA_MISSING_PARENT)) {
 			DEBUG(3,("Missing parent while attempting to apply records: %s\n",
 				 ldb_errstring(ldb)));
+		} else {
+			DEBUG(1,("Failed to apply records: %s: %s\n",
+				 ldb_errstring(ldb), ldb_strerror(ret)));
 		}
 		ldb_transaction_cancel(ldb);
 		TALLOC_FREE(tmp_ctx);
diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index dc16674..c04358f 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -114,6 +114,7 @@ static int replmd_check_upgrade_links(struct ldb_context *ldb,
 				      struct parsed_dn *dns, uint32_t count,
 				      struct ldb_message_element *el,
 				      const char *ldap_oid);
+static int replmd_verify_linked_attributes(struct replmd_replicated_request *ar);
 
 enum urgent_situation {
 	REPL_URGENT_ON_CREATE = 1,
@@ -6057,7 +6058,18 @@ static int replmd_replicated_apply_next(struct replmd_replicated_request *ar)
 	struct GUID_txt_buf guid_str_buf;
 
 	if (ar->index_current >= ar->objs->num_objects) {
-		/* done with it, go to next stage */
+
+		/*
+		 * Now that we've applied all the objects, check that the new
+		 * linked attributes only reference objects we know about
+		 */
+		ret = replmd_verify_linked_attributes(ar);
+
+		if (ret != LDB_SUCCESS) {
+			return ret;
+		}
+
+		/* done applying objects, move on to the next stage */
 		return replmd_replicated_uptodate_vector(ar);
 	}
 
@@ -6812,35 +6824,29 @@ static int replmd_check_target_exists(struct ldb_module *module,
 	return ret;
 }
 
-/*
-  process one linked attribute structure
+/**
+ * Extracts the key details about the source/target object for a
+ * linked-attribute entry.
+ * This returns the following details:
+ * @param ret_attr the schema details for the linked attribute
+ * @param source_msg the search result for the source object
+ * @param target_dsdb_dn the unpacked DN info for the target object
  */
-static int replmd_process_linked_attribute(struct ldb_module *module,
-					   struct replmd_private *replmd_private,
+static int replmd_extract_la_entry_details(struct ldb_module *module,
 					   struct la_entry *la_entry,
-					   struct ldb_request *parent)
+					   TALLOC_CTX *mem_ctx,
+					   const struct dsdb_attribute **ret_attr,
+					   struct ldb_message **source_msg,
+					   struct dsdb_dn **target_dsdb_dn)
 {
 	struct drsuapi_DsReplicaLinkedAttribute *la = la_entry->la;
 	struct ldb_context *ldb = ldb_module_get_ctx(module);
-	struct ldb_message *msg;
-	TALLOC_CTX *tmp_ctx = talloc_new(la_entry);
-	const struct dsdb_schema *schema = dsdb_get_schema(ldb, tmp_ctx);
+	const struct dsdb_schema *schema = dsdb_get_schema(ldb, mem_ctx);
 	int ret;
 	const struct dsdb_attribute *attr;
-	struct dsdb_dn *dsdb_dn;
-	uint64_t seq_num = 0;
-	struct ldb_message_element *old_el;
 	WERROR status;
-	time_t t = time(NULL);
 	struct ldb_result *res;
 	const char *attrs[4];
-	struct parsed_dn *pdn_list, *pdn, *next;
-	struct GUID guid = GUID_zero();
-	bool active = (la->flags & DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE)?true:false;
-
-	enum deletion_state deletion_state = OBJECT_NOT_DELETED;
-	enum deletion_state target_deletion_state = OBJECT_NOT_DELETED;
-
 
 /*
 linked_attributes[0]:
@@ -6885,7 +6891,6 @@ linked_attributes[0]:
 				       la->attid,
 				       GUID_buf_string(&la->identifier->guid,
 						       &guid_str));
-		talloc_free(tmp_ctx);
 		return LDB_ERR_OPERATIONS_ERROR;
 	}
 
@@ -6894,28 +6899,130 @@ linked_attributes[0]:
 	attrs[2] = "isRecycled";
 	attrs[3] = NULL;
 
-	/* get the existing message from the db for the object with
-	   this GUID, returning attribute being modified. We will then
-	   use this msg as the basis for a modify call */
-	ret = dsdb_module_search(module, tmp_ctx, &res, NULL, LDB_SCOPE_SUBTREE, attrs,
+	/*
+	 * get the existing message from the db for the object with
+	 * this GUID, returning attribute being modified. We will then
+	 * use this msg as the basis for a modify call
+	 */
+	ret = dsdb_module_search(module, mem_ctx, &res, NULL, LDB_SCOPE_SUBTREE, attrs,
 	                         DSDB_FLAG_NEXT_MODULE |
 				 DSDB_SEARCH_SEARCH_ALL_PARTITIONS |
 				 DSDB_SEARCH_SHOW_RECYCLED |
 				 DSDB_SEARCH_SHOW_DN_IN_STORAGE_FORMAT |
 				 DSDB_SEARCH_REVEAL_INTERNALS,
-				 parent,
-				 "objectGUID=%s", GUID_string(tmp_ctx, &la->identifier->guid));
+				 NULL,
+				 "objectGUID=%s", GUID_string(mem_ctx, &la->identifier->guid));
 	if (ret != LDB_SUCCESS) {
-		talloc_free(tmp_ctx);
 		return ret;
 	}
 	if (res->count != 1) {
 		ldb_asprintf_errstring(ldb, "DRS linked attribute for GUID %s - DN not found",
-				       GUID_string(tmp_ctx, &la->identifier->guid));
-		talloc_free(tmp_ctx);
+				       GUID_string(mem_ctx, &la->identifier->guid));
 		return LDB_ERR_NO_SUCH_OBJECT;
 	}
-	msg = res->msgs[0];
+
+	*source_msg = res->msgs[0];
+
+	/* the value blob for the attribute holds the target object DN */
+	status = dsdb_dn_la_from_blob(ldb, attr, schema, mem_ctx, la->value.blob, target_dsdb_dn);
+	if (!W_ERROR_IS_OK(status)) {
+		ldb_asprintf_errstring(ldb, "Failed to parsed linked attribute blob for %s on %s - %s\n",
+				       attr->lDAPDisplayName,
+				       ldb_dn_get_linearized(res->msgs[0]->dn),
+				       win_errstr(status));
+		return LDB_ERR_OPERATIONS_ERROR;
+	}
+
+	*ret_attr = attr;
+
+	return LDB_SUCCESS;
+}
+
+/**
+ * Verifies that the source and target objects are known for each linked
+ * attribute in the current transaction.
+ */
+static int replmd_verify_linked_attributes(struct replmd_replicated_request *ar)
+{
+	int ret = LDB_SUCCESS;
+	struct la_entry *la;
+	struct ldb_module *module = ar->module;
+	TALLOC_CTX *tmp_ctx = talloc_new(ar);
+	struct replmd_private *replmd_private =
+		talloc_get_type(ldb_module_get_private(module), struct replmd_private);
+
+	for (la = DLIST_TAIL(replmd_private->la_list); la; la = DLIST_PREV(la)) {
+		struct ldb_message *src_msg;
+		const struct dsdb_attribute *attr;
+		struct dsdb_dn *tgt_dsdb_dn;
+		enum deletion_state del_state = OBJECT_NOT_DELETED;
+		struct GUID guid = GUID_zero();
+
+		ret = replmd_extract_la_entry_details(module, la, tmp_ctx, &attr,
+						      &src_msg, &tgt_dsdb_dn);
+
+		if (ret != LDB_SUCCESS) {
+			break;
+		}
+
+		ret = replmd_check_target_exists(module, tgt_dsdb_dn, la,
+						 src_msg->dn, &guid, &del_state);
+
+		/*
+		 * When we fail to find the target object, the error code we pass
+		 * back here is really important. It flags back to the callers to
+		 * retry this request with DRSUAPI_DRS_GET_TGT
+		 */
+		if (ret == LDB_ERR_NO_SUCH_OBJECT) {
+			ret = replmd_replicated_request_werror(ar, WERR_DS_DRA_RECYCLED_TARGET);
+		}
+
+		if (ret != LDB_SUCCESS) {
+			break;
+		}
+	}
+
+	talloc_free(tmp_ctx);
+	return ret;
+}
+
+/*
+  process one linked attribute structure
+ */
+static int replmd_process_linked_attribute(struct ldb_module *module,
+					   struct replmd_private *replmd_private,
+					   struct la_entry *la_entry,
+					   struct ldb_request *parent)
+{
+	struct drsuapi_DsReplicaLinkedAttribute *la = la_entry->la;
+	struct ldb_context *ldb = ldb_module_get_ctx(module);
+	struct ldb_message *msg;
+	TALLOC_CTX *tmp_ctx = talloc_new(la_entry);
+	const struct dsdb_schema *schema = dsdb_get_schema(ldb, tmp_ctx);
+	int ret;
+	const struct dsdb_attribute *attr;
+	struct dsdb_dn *dsdb_dn;
+	uint64_t seq_num = 0;
+	struct ldb_message_element *old_el;
+	time_t t = time(NULL);
+	struct parsed_dn *pdn_list, *pdn, *next;
+	struct GUID guid = GUID_zero();
+	bool active = (la->flags & DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE)?true:false;
+
+	enum deletion_state deletion_state = OBJECT_NOT_DELETED;
+	enum deletion_state target_deletion_state = OBJECT_NOT_DELETED;
+
+	/*
+	 * get the attribute being modified, the search result for the source object,
+	 * and the target object's DN details
+	 */
+	ret = replmd_extract_la_entry_details(module, la_entry, tmp_ctx, &attr,
+					      &msg, &dsdb_dn);
+
+	if (ret != LDB_SUCCESS) {
+		talloc_free(tmp_ctx);
+		return ret;
+	}
 
 	/*
 	 * Check for deleted objects per MS-DRSR 4.1.10.6.13
@@ -6924,7 +7031,6 @@ linked_attributes[0]:
 	 * any existing link, that should have happened when the
 	 * object deletion was replicated or initiated.
 	 */
-
 	replmd_deletion_state(module, msg, &deletion_state, NULL);
 
 	if (deletion_state >= OBJECT_RECYCLED) {
@@ -6953,14 +7059,6 @@ linked_attributes[0]:
 		return ret;
 	}
 
-	status = dsdb_dn_la_from_blob(ldb, attr, schema, tmp_ctx, la->value.blob, &dsdb_dn);
-	if (!W_ERROR_IS_OK(status)) {
-		ldb_asprintf_errstring(ldb, "Failed to parsed linked attribute blob for %s on %s - %s\n",
-				       old_el->name, ldb_dn_get_linearized(msg->dn), win_errstr(status));
-		talloc_free(tmp_ctx);
-		return LDB_ERR_OPERATIONS_ERROR;
-	}
-
 	ret = replmd_check_target_exists(module, dsdb_dn, la_entry, msg->dn,
 					 &guid, &target_deletion_state);
 
-- 
2.7.4


From ced5a6cb1e9611c567ad36098e48705ac694a8c8 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Thu, 15 Jun 2017 16:52:33 +1200
Subject: [PATCH 05/11] drepl: Support GET_TGT on periodic replication client

- Update IDL comments to include Microsoft reference doc
- Add support for sending v10 GetNCChanges request (needed for the
  GET_TGT flag, which is in the new 'more_flags' field)
- Update to also set the GET_TGT flag in the same place we were setting
  GET_ANC (I split this logic out into a separate function).
- The state struct now needs to hold a 'more_flags' field as well (this
  flag is different to the GET_ANC replica flag)

Note that using the GET_TGT when replicating from a Windows DC could be
highly inefficient. Because Samba keeps the GET_TGT flag set throughout
the replication cycle, it will basically receive a repeated object from
Windows for every single linked attribute that it receives.

I believe Windows behaviour only expects the client to set the GET_TGT
flag when it actually needs to (i.e. when it receives a target object it
doesn't know about), rather than throughout the replication cycle.
However, this approach won't work with Samba-to-Samba replication,
because when the server receives the GET_TGT flag it restarts the
replication cycle from scratch. So if we only set the GET_TGT flag when
the client encountered an unknown target then Samba-to-Samba could
potentially get into an endless replication loop.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 librpc/idl/drsuapi.idl                |  4 +-
 source4/dsdb/repl/drepl_out_helpers.c | 97 +++++++++++++++++++++++++++++------
 source4/dsdb/repl/drepl_out_pull.c    |  1 +
 source4/dsdb/repl/drepl_service.h     |  2 +
 4 files changed, 86 insertions(+), 18 deletions(-)

diff --git a/librpc/idl/drsuapi.idl b/librpc/idl/drsuapi.idl
index 2eb1d65..51ef567 100644
--- a/librpc/idl/drsuapi.idl
+++ b/librpc/idl/drsuapi.idl
@@ -70,7 +70,9 @@ interface drsuapi
         } drsuapi_DrsUpdate;
 
 	/*****************/
-        /* Function 0x00 */
+        /* Function 0x00 drsuapi_DsBind() */
+
+        /* MS-DRSR 5.39 DRS_EXTENSIONS_INT */
         typedef [bitmap32bit] bitmap {
 		DRSUAPI_SUPPORTED_EXTENSION_BASE			= 0x00000001,
 		DRSUAPI_SUPPORTED_EXTENSION_ASYNC_REPLICATION		= 0x00000002,
diff --git a/source4/dsdb/repl/drepl_out_helpers.c b/source4/dsdb/repl/drepl_out_helpers.c
index ad4801a..6e24e7f 100644
--- a/source4/dsdb/repl/drepl_out_helpers.c
+++ b/source4/dsdb/repl/drepl_out_helpers.c
@@ -555,7 +555,28 @@ static void dreplsrv_op_pull_source_get_changes_trigger(struct tevent_req *req)
 	}
 
 	r->in.bind_handle	= &drsuapi->bind_handle;
-	if (drsuapi->remote_info28.supported_extensions & DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V8) {
+
+	if (drsuapi->remote_info28.supported_extensions & DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V10) {
+		r->in.level				= 10;
+		r->in.req->req10.destination_dsa_guid	= service->ntds_guid;
+		r->in.req->req10.source_dsa_invocation_id= rf1->source_dsa_invocation_id;
+		r->in.req->req10.naming_context		= &partition->nc;
+		r->in.req->req10.highwatermark		= highwatermark;
+		r->in.req->req10.uptodateness_vector	= uptodateness_vector;
+		r->in.req->req10.replica_flags		= replica_flags;
+		r->in.req->req10.max_object_count	= 133;
+		r->in.req->req10.max_ndr_size		= 1336811;
+		r->in.req->req10.extended_op		= state->op->extended_op;
+		r->in.req->req10.fsmo_info		= state->op->fsmo_info;
+		r->in.req->req10.partial_attribute_set	= pas;
+		r->in.req->req10.partial_attribute_set_ex= NULL;
+		r->in.req->req10.mapping_ctr.num_mappings= mappings == NULL ? 0 : mappings->num_mappings;
+		r->in.req->req10.mapping_ctr.mappings	= mappings == NULL ? NULL : mappings->mappings;
+
+		/* the only difference to v8 is the more_flags */
+		r->in.req->req10.more_flags = state->op->more_flags;
+
+	} else if (drsuapi->remote_info28.supported_extensions & DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V8) {
 		r->in.level				= 8;
 		r->in.req->req8.destination_dsa_guid	= service->ntds_guid;
 		r->in.req->req8.source_dsa_invocation_id= rf1->source_dsa_invocation_id;
@@ -693,6 +714,53 @@ static void dreplsrv_op_pull_source_get_changes_done(struct tevent_req *subreq)
 	dreplsrv_op_pull_source_apply_changes_trigger(req, r, ctr_level, ctr1, ctr6);
 }
 
+/**
+ * If processing a chunk of replication data fails, check if it is due to a
+ * problem that can be fixed by setting extra flags in the GetNCChanges request,
+ * i.e. GET_ANC or GET_TGT.
+ * @returns NT_STATUS_OK if the request was retried, and an error code if not
+ */
+static NTSTATUS dreplsrv_op_pull_retry_with_flags(struct tevent_req *req,
+						  WERROR error_code)
+{
+	struct dreplsrv_op_pull_source_state *state;
+	NTSTATUS nt_status = NT_STATUS_OK;
+
+	state = tevent_req_data(req, struct dreplsrv_op_pull_source_state);
+
+	/*
+	 * Check if we failed to apply the records due to a missing parent or
+	 * target object. If so, try again and ask for any mising parent/target
+	 * objects to be included this time.
+	 */
+	if (W_ERROR_EQUAL(error_code, WERR_DS_DRA_RECYCLED_TARGET)) {
+
+		if (state->op->more_flags & DRSUAPI_DRS_GET_TGT) {
+			DEBUG(1,("Missing target object despite setting DRSUAPI_DRS_GET_TGT flag\n"));
+			nt_status = NT_STATUS_INVALID_NETWORK_RESPONSE;
+		} else {
+			state->op->more_flags |= DRSUAPI_DRS_GET_TGT;
+			DEBUG(1,("Missing target object when we didn't set the DRSUAPI_DRS_GET_TGT flag, retrying\n"));
+			dreplsrv_op_pull_source_get_changes_trigger(req);
+		}
+	} else if (W_ERROR_EQUAL(error_code, WERR_DS_DRA_MISSING_PARENT)) {
+
+		if (state->op->options & DRSUAPI_DRS_GET_ANC) {
+			DEBUG(1,("Missing parent object despite setting DRSUAPI_DRS_GET_ANC flag\n"));
+			nt_status = NT_STATUS_INVALID_NETWORK_RESPONSE;
+		} else {
+			state->op->options |= DRSUAPI_DRS_GET_ANC;
+			DEBUG(4,("Missing parent object when we didn't set the DRSUAPI_DRS_GET_ANC flag, retrying\n"));
+			dreplsrv_op_pull_source_get_changes_trigger(req);
+		}
+	} else {
+		nt_status = werror_to_ntstatus(WERR_BAD_NET_RESP);
+	}
+
+	return nt_status;
+}
+
+
 static void dreplsrv_update_refs_trigger(struct tevent_req *req);
 
 static void dreplsrv_op_pull_source_apply_changes_trigger(struct tevent_req *req,
@@ -937,25 +1005,20 @@ static void dreplsrv_op_pull_source_apply_changes_trigger(struct tevent_req *req
 	if (!W_ERROR_IS_OK(status)) {
 
 		/*
-		 * If we failed to apply the records due to a missing
-		 * parent, try again after asking for the parent
-		 * records first.  Because we don't update the
-		 * highwatermark, we start this part of the cycle
-		 * again.
+		 * Check if this error can be fixed by resending the GetNCChanges
+		 * request with extra flags set (i.e. GET_ANC/GET_TGT)
 		 */
-		if (((state->op->options & DRSUAPI_DRS_GET_ANC) == 0)
-		    && W_ERROR_EQUAL(status, WERR_DS_DRA_MISSING_PARENT)) {
-			state->op->options |= DRSUAPI_DRS_GET_ANC;
-			DEBUG(4,("Missing parent object when we didn't set the DRSUAPI_DRS_GET_ANC flag, retrying\n"));
-			dreplsrv_op_pull_source_get_changes_trigger(req);
+		nt_status = dreplsrv_op_pull_retry_with_flags(req, status);
+
+		if (NT_STATUS_IS_OK(nt_status)) {
+
+			/*
+			 * We resent the request. Don't update the highwatermark,
+			 * we'll start this part of the cycle again.
+			 */
 			return;
-		} else if (((state->op->options & DRSUAPI_DRS_GET_ANC))
-			   && W_ERROR_EQUAL(status, WERR_DS_DRA_MISSING_PARENT)) {
-			DEBUG(1,("Missing parent object despite setting DRSUAPI_DRS_GET_ANC flag\n"));
-			nt_status = NT_STATUS_INVALID_NETWORK_RESPONSE;
-		} else {
-			nt_status = werror_to_ntstatus(WERR_BAD_NET_RESP);
 		}
+
 		DEBUG(0,("Failed to commit objects: %s/%s\n",
 			  win_errstr(status), nt_errstr(nt_status)));
 		tevent_req_nterror(req, nt_status);
diff --git a/source4/dsdb/repl/drepl_out_pull.c b/source4/dsdb/repl/drepl_out_pull.c
index 8b8ecd9..62a403c 100644
--- a/source4/dsdb/repl/drepl_out_pull.c
+++ b/source4/dsdb/repl/drepl_out_pull.c
@@ -126,6 +126,7 @@ WERROR dreplsrv_schedule_partition_pull_source(struct dreplsrv_service *s,
 	op->callback    = callback;
 	op->cb_data	= cb_data;
 	op->schedule_time = time(NULL);
+	op->more_flags	= 0;
 
 	DLIST_ADD_END(s->ops.pending, op);
 
diff --git a/source4/dsdb/repl/drepl_service.h b/source4/dsdb/repl/drepl_service.h
index 4b0e43f..8c44e19 100644
--- a/source4/dsdb/repl/drepl_service.h
+++ b/source4/dsdb/repl/drepl_service.h
@@ -130,6 +130,8 @@ struct dreplsrv_out_operation {
 	enum drsuapi_DsExtendedError extended_ret;
 	dreplsrv_extended_callback_t callback;
 	void *cb_data;
+	/* more replication flags - used by DsReplicaSync GET_TGT */
+	uint32_t more_flags;
 };
 
 struct dreplsrv_notify_operation {
-- 
2.7.4


From cd6d4c044e02b53c46d265fe7b4b988553808801 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Fri, 16 Jun 2017 09:49:16 +1200
Subject: [PATCH 06/11] replmd: Set GET_ANC if Windows sends a link with
 unknown source object

Windows replication can send the linked attribute before it sends the
source object. The MS-DRSR spec says that in this case the client should
resend the GetNCChanges request with the GET_ANC flag set. In my testing
this resolves the problem - Windows will include the source object for the
linked attribute in the same replication chunk.

This problem doesn't happen with Samba-to-Samba replication, because the
source object for the linked attribute is guaranteed to have already been
sent.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index c04358f..fde9b72 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -6961,6 +6961,17 @@ static int replmd_verify_linked_attributes(struct replmd_replicated_request *ar)
 		ret = replmd_extract_la_entry_details(module, la, tmp_ctx, &attr,
 						      &src_msg, &tgt_dsdb_dn);
 
+		/*
+		 * When we fail to find the source object, the error code we pass
+		 * back here is really important. It flags back to the callers to
+		 * retry this request with DRSUAPI_DRS_GET_ANC. This case should
+		 * never happen if we're replicating from a Samba DC, but it is
+		 * needed to talk to a Windows DC
+		 */
+		if (ret == LDB_ERR_NO_SUCH_OBJECT) {
+			ret = replmd_replicated_request_werror(ar, WERR_DS_DRA_MISSING_PARENT);
+		}
+
 		if (ret != LDB_SUCCESS) {
 			break;
 		}
-- 
2.7.4


From d8a51e100e6d40f3546e32346adce22ca017f9cb Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Fri, 16 Jun 2017 12:54:32 +1200
Subject: [PATCH 07/11] drs_utils: Add GET_TGT support to 'samba-tool drs
 replicate --local'

Update drs_Replicate.replicate() so it handles being passed the GET_TGT
flag (more_flags). To do this, we need to always use a v10 GetNCChanges
request (v8 and v10 are essentially the same except for the more_flags).

If the replicate_chunk() call into the C bindings throws an error, check
to see whether the error could be fixed by setting the GET_TGT flag, and
re-send the request if so.

Unfortunately because WERR_DS_DRA_RECYCLED_TARGET isn't documented with
the other AD error codes, I've left it hardcoded for now (Microsoft
should be fixing up their Docs).

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 python/samba/drs_utils.py | 101 ++++++++++++++++++++++++++++++----------------
 1 file changed, 66 insertions(+), 35 deletions(-)

diff --git a/python/samba/drs_utils.py b/python/samba/drs_utils.py
index b9ed059..bed8817 100644
--- a/python/samba/drs_utils.py
+++ b/python/samba/drs_utils.py
@@ -21,6 +21,8 @@ from samba.dcerpc import drsuapi, misc, drsblobs
 from samba.net import Net
 from samba.ndr import ndr_unpack
 from samba import dsdb
+from samba import werror
+from samba import WERRORError
 import samba, ldb
 
 
@@ -198,18 +200,37 @@ class drs_Replicate(object):
             raise RuntimeError("Must not set GUID 00000000-0000-0000-0000-000000000000 as invocation_id")
         self.replication_state = self.net.replicate_init(self.samdb, lp, self.drs, invocation_id)
 
+    def _should_retry_with_get_tgt(self, error_code, req):
+
+        # If the error indicates we fail to resolve a target object for a
+        # linked attribute, then we should retry the request with GET_TGT
+        # (if we support it and haven't already tried that)
+
+        # TODO fix up the below line when we next update werror_err_table.txt
+        # and pull in the new error-code
+        # return (error_code == werror.WERR_DS_DRA_RECYCLED_TARGET and
+        return (error_code == 0x21bf and
+                (req.more_flags & drsuapi.DRSUAPI_DRS_GET_TGT) == 0 and
+                self.supported_extensions & drsuapi.DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V10)
+
     def replicate(self, dn, source_dsa_invocation_id, destination_dsa_guid,
                   schema=False, exop=drsuapi.DRSUAPI_EXOP_NONE, rodc=False,
-                  replica_flags=None, full_sync=True, sync_forced=False):
+                  replica_flags=None, full_sync=True, sync_forced=False, more_flags=0):
         '''replicate a single DN'''
 
         # setup for a GetNCChanges call
-        req8 = drsuapi.DsGetNCChangesRequest8()
+        if self.supported_extensions & drsuapi.DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V10:
+            req = drsuapi.DsGetNCChangesRequest10()
+            req.more_flags = more_flags
+            req_level = 10
+        else:
+            req_level = 8
+            req = drsuapi.DsGetNCChangesRequest8()
 
-        req8.destination_dsa_guid = destination_dsa_guid
-        req8.source_dsa_invocation_id = source_dsa_invocation_id
-        req8.naming_context = drsuapi.DsReplicaObjectIdentifier()
-        req8.naming_context.dn = dn
+        req.destination_dsa_guid = destination_dsa_guid
+        req.source_dsa_invocation_id = source_dsa_invocation_id
+        req.naming_context = drsuapi.DsReplicaObjectIdentifier()
+        req.naming_context.dn = dn
 
         # Default to a full replication if we don't find an upToDatenessVector
         udv = None
@@ -244,49 +265,46 @@ class drs_Replicate(object):
             udv.cursors = cursors_v1
             udv.count = len(cursors_v1)
 
-        req8.highwatermark = hwm
-        req8.uptodateness_vector = udv
+        req.highwatermark = hwm
+        req.uptodateness_vector = udv
 
         if replica_flags is not None:
-            req8.replica_flags = replica_flags
+            req.replica_flags = replica_flags
         elif exop == drsuapi.DRSUAPI_EXOP_REPL_SECRET:
-            req8.replica_flags = 0
+            req.replica_flags = 0
         else:
-            req8.replica_flags = (drsuapi.DRSUAPI_DRS_INIT_SYNC |
-                                  drsuapi.DRSUAPI_DRS_PER_SYNC |
-                                  drsuapi.DRSUAPI_DRS_GET_ANC |
-                                  drsuapi.DRSUAPI_DRS_NEVER_SYNCED |
-                                  drsuapi.DRSUAPI_DRS_GET_ALL_GROUP_MEMBERSHIP)
+            req.replica_flags = (drsuapi.DRSUAPI_DRS_INIT_SYNC |
+                                 drsuapi.DRSUAPI_DRS_PER_SYNC |
+                                 drsuapi.DRSUAPI_DRS_GET_ANC |
+                                 drsuapi.DRSUAPI_DRS_NEVER_SYNCED |
+                                 drsuapi.DRSUAPI_DRS_GET_ALL_GROUP_MEMBERSHIP)
             if rodc:
-                req8.replica_flags |= (
-                    drsuapi.DRSUAPI_DRS_SPECIAL_SECRET_PROCESSING)
+                req.replica_flags |= (
+                     drsuapi.DRSUAPI_DRS_SPECIAL_SECRET_PROCESSING)
             else:
-                req8.replica_flags |= drsuapi.DRSUAPI_DRS_WRIT_REP
+                req.replica_flags |= drsuapi.DRSUAPI_DRS_WRIT_REP
 
         if sync_forced:
-            req8.replica_flags |= drsuapi.DRSUAPI_DRS_SYNC_FORCED
+            req.replica_flags |= drsuapi.DRSUAPI_DRS_SYNC_FORCED
 
-        req8.max_object_count = 402
-        req8.max_ndr_size = 402116
-        req8.extended_op = exop
-        req8.fsmo_info = 0
-        req8.partial_attribute_set = None
-        req8.partial_attribute_set_ex = None
-        req8.mapping_ctr.num_mappings = 0
-        req8.mapping_ctr.mappings = None
+        req.max_object_count = 402
+        req.max_ndr_size = 402116
+        req.extended_op = exop
+        req.fsmo_info = 0
+        req.partial_attribute_set = None
+        req.partial_attribute_set_ex = None
+        req.mapping_ctr.num_mappings = 0
+        req.mapping_ctr.mappings = None
 
         if not schema and rodc:
-            req8.partial_attribute_set = drs_get_rodc_partial_attribute_set(self.samdb)
+            req.partial_attribute_set = drs_get_rodc_partial_attribute_set(self.samdb)
 
-        if self.supported_extensions & drsuapi.DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V8:
-            req_level = 8
-            req = req8
-        else:
+        if not self.supported_extensions & drsuapi.DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V8:
             req_level = 5
             req5 = drsuapi.DsGetNCChangesRequest5()
             for a in dir(req5):
                 if a[0] != '_':
-                    setattr(req5, a, getattr(req8, a))
+                    setattr(req5, a, getattr(req, a))
             req = req5
 
         num_objects = 0
@@ -295,8 +313,21 @@ class drs_Replicate(object):
             (level, ctr) = self.drs.DsGetNCChanges(self.drs_handle, req_level, req)
             if ctr.first_object is None and ctr.object_count != 0:
                 raise RuntimeError("DsGetNCChanges: NULL first_object with object_count=%u" % (ctr.object_count))
-            self.net.replicate_chunk(self.replication_state, level, ctr,
-                schema=schema, req_level=req_level, req=req)
+
+            try:
+                self.net.replicate_chunk(self.replication_state, level, ctr,
+                    schema=schema, req_level=req_level, req=req)
+            except WERRORError as e:
+                # Check if retrying with the GET_TGT flag set might resolve this error
+                if self._should_retry_with_get_tgt(e[0], req):
+
+                    print("Missing target object - retrying with DRS_GET_TGT")
+                    req.more_flags |= drsuapi.DRSUAPI_DRS_GET_TGT
+
+                    # try sending the request again
+                    continue
+                else:
+                    raise e
 
             num_objects += ctr.object_count
 
-- 
2.7.4


From 91a112811dd18b5f9f1dcb1a34e4233a9c8cbc8b Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Thu, 22 Jun 2017 10:43:31 +1200
Subject: [PATCH 08/11] replmd: Move where we store linked attributes

There was a bug in my previous patch where the code would verify
*all* links in the list, rather than just the ones that are new. And it
would do this for every replication chunk it received, regardless of
whether there were actually any links in that chunk.

The problem is by the time we want to verify the attributes, we don't
actually know which attributes are new. We can fix this by moving where
we store the linked attributes from the start of processing the
replication chunk to the end of processing the chunk. We can then verify
the new linked attributes at the same time we store them.

Longer-term we may want to try to apply the linked attribute at this
point. This would save looking up the source/target objects twice, but
it makes things a bit more complicated (attributes will usually apply at
this point *most* of the time, but not *all* the time).

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 193 +++++++++++++-----------
 1 file changed, 103 insertions(+), 90 deletions(-)

diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index fde9b72..0a53d1d 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -114,7 +114,8 @@ static int replmd_check_upgrade_links(struct ldb_context *ldb,
 				      struct parsed_dn *dns, uint32_t count,
 				      struct ldb_message_element *el,
 				      const char *ldap_oid);
-static int replmd_verify_linked_attributes(struct replmd_replicated_request *ar);
+static int replmd_verify_linked_attribute(struct replmd_replicated_request *ar,
+					  struct la_entry *la);
 
 enum urgent_situation {
 	REPL_URGENT_ON_CREATE = 1,
@@ -6042,6 +6043,70 @@ static int replmd_replicated_apply_search_callback(struct ldb_request *req,
 	return LDB_SUCCESS;
 }
 
+/**
+ * Stores the linked attributes received in the replication chunk - these get
+ * applied at the end of the transaction. We also check that each linked
+ * attribute is valid, i.e. source and target objects are known.
+ */
+static int replmd_store_linked_attributes(struct replmd_replicated_request *ar)
+{
+	int ret = LDB_SUCCESS;
+	uint32_t i;
+	bool incomplete_subset;
+	struct ldb_module *module = ar->module;
+	struct replmd_private *replmd_private =
+		talloc_get_type(ldb_module_get_private(module), struct replmd_private);
+	struct ldb_context *ldb;
+
+	ldb = ldb_module_get_ctx(module);
+	incomplete_subset = (ar->objs->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET);
+
+	DEBUG(4,("linked_attributes_count=%u\n", ar->objs->linked_attributes_count));
+
+	/* save away the linked attributes for the end of the transaction */
+	for (i = 0; i < ar->objs->linked_attributes_count; i++) {
+		struct la_entry *la_entry;
+
+		if (replmd_private->la_ctx == NULL) {
+			replmd_private->la_ctx = talloc_new(replmd_private);
+		}
+		la_entry = talloc(replmd_private->la_ctx, struct la_entry);
+		if (la_entry == NULL) {
+			ldb_oom(ldb);
+			return LDB_ERR_OPERATIONS_ERROR;
+		}
+		la_entry->la = talloc(la_entry, struct drsuapi_DsReplicaLinkedAttribute);
+		if (la_entry->la == NULL) {
+			talloc_free(la_entry);
+			ldb_oom(ldb);
+			return LDB_ERR_OPERATIONS_ERROR;
+		}
+		*la_entry->la = ar->objs->linked_attributes[i];
+
+		/*
+		 * we may not be able to resolve link targets properly when
+		 * dealing with subsets of objects, e.g. the source is a
+		 * critical object and the target isn't
+		 */
+		la_entry->incomplete_replica = incomplete_subset;
+
+		/* we need to steal the non-scalars so they stay
+		   around until the end of the transaction */
+		talloc_steal(la_entry->la, la_entry->la->identifier);
+		talloc_steal(la_entry->la, la_entry->la->value.blob);
+
+		ret = replmd_verify_linked_attribute(ar, la_entry);
+
+		if (ret != LDB_SUCCESS) {
+			break;
+		}
+
+		DLIST_ADD(replmd_private->la_list, la_entry);
+	}
+
+	return ret;
+}
+
 static int replmd_replicated_uptodate_vector(struct replmd_replicated_request *ar);
 
 static int replmd_replicated_apply_next(struct replmd_replicated_request *ar)
@@ -6060,10 +6125,10 @@ static int replmd_replicated_apply_next(struct replmd_replicated_request *ar)
 	if (ar->index_current >= ar->objs->num_objects) {
 
 		/*
-		 * Now that we've applied all the objects, check that the new
-		 * linked attributes only reference objects we know about
+		 * Now that we've applied all the objects, check the new linked
+		 * attributes and store them (we apply them in .prepare_commit)
 		 */
-		ret = replmd_verify_linked_attributes(ar);
+		ret = replmd_store_linked_attributes(ar);
 
 		if (ret != LDB_SUCCESS) {
 			return ret;
@@ -6547,10 +6612,6 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 	struct replmd_replicated_request *ar;
 	struct ldb_control **ctrls;
 	int ret;
-	uint32_t i;
-	bool incomplete_subset;
-	struct replmd_private *replmd_private =
-		talloc_get_type(ldb_module_get_private(module), struct replmd_private);
 
 	ldb = ldb_module_get_ctx(module);
 
@@ -6606,45 +6667,6 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 
 	ar->controls = req->controls;
 	req->controls = ctrls;
-	incomplete_subset = (ar->objs->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET);
-
-	DEBUG(4,("linked_attributes_count=%u\n", objs->linked_attributes_count));
-
-	/* save away the linked attributes for the end of the
-	   transaction */
-	for (i=0; i<ar->objs->linked_attributes_count; i++) {
-		struct la_entry *la_entry;
-
-		if (replmd_private->la_ctx == NULL) {
-			replmd_private->la_ctx = talloc_new(replmd_private);
-		}
-		la_entry = talloc(replmd_private->la_ctx, struct la_entry);
-		if (la_entry == NULL) {
-			ldb_oom(ldb);
-			return LDB_ERR_OPERATIONS_ERROR;
-		}
-		la_entry->la = talloc(la_entry, struct drsuapi_DsReplicaLinkedAttribute);
-		if (la_entry->la == NULL) {
-			talloc_free(la_entry);
-			ldb_oom(ldb);
-			return LDB_ERR_OPERATIONS_ERROR;
-		}
-		*la_entry->la = ar->objs->linked_attributes[i];
-
-		/*
-		 * we may not be able to resolve link targets properly when
-		 * dealing with subsets of objects, e.g. the source is a
-		 * critical object and the target isn't
-		 */
-		la_entry->incomplete_replica = incomplete_subset;
-
-		/* we need to steal the non-scalars so they stay
-		   around until the end of the transaction */
-		talloc_steal(la_entry->la, la_entry->la->identifier);
-		talloc_steal(la_entry->la, la_entry->la->value.blob);
-
-		DLIST_ADD(replmd_private->la_list, la_entry);
-	}
 
 	return replmd_replicated_apply_next(ar);
 }
@@ -6939,58 +6961,49 @@ linked_attributes[0]:
 }
 
 /**
- * Verifies that the source and target objects are known for each linked
- * attribute in the current transaction.
+ * Verifies the source and target objects are known for a linked attribute
  */
-static int replmd_verify_linked_attributes(struct replmd_replicated_request *ar)
+static int replmd_verify_linked_attribute(struct replmd_replicated_request *ar,
+					  struct la_entry *la)
 {
 	int ret = LDB_SUCCESS;
-	struct la_entry *la;
+	TALLOC_CTX *tmp_ctx = talloc_new(la);
 	struct ldb_module *module = ar->module;
-	TALLOC_CTX *tmp_ctx = talloc_new(ar);
-	struct replmd_private *replmd_private =
-		talloc_get_type(ldb_module_get_private(module), struct replmd_private);
-
-	for (la = DLIST_TAIL(replmd_private->la_list); la; la = DLIST_PREV(la)) {
-		struct ldb_message *src_msg;
-		const struct dsdb_attribute *attr;
-		struct dsdb_dn *tgt_dsdb_dn;
-		enum deletion_state del_state = OBJECT_NOT_DELETED;
-		struct GUID guid = GUID_zero();
-
-		ret = replmd_extract_la_entry_details(module, la, tmp_ctx, &attr,
-						      &src_msg, &tgt_dsdb_dn);
+	struct ldb_message *src_msg;
+	const struct dsdb_attribute *attr;
+	struct dsdb_dn *tgt_dsdb_dn;
+	enum deletion_state del_state = OBJECT_NOT_DELETED;
+	struct GUID guid = GUID_zero();
 
-		/*
-		 * When we fail to find the source object, the error code we pass
-		 * back here is really important. It flags back to the callers to
-		 * retry this request with DRSUAPI_DRS_GET_ANC. This case should
-		 * never happen if we're replicating from a Samba DC, but it is
-		 * needed to talk to a Windows DC
-		 */
-		if (ret == LDB_ERR_NO_SUCH_OBJECT) {
-			ret = replmd_replicated_request_werror(ar, WERR_DS_DRA_MISSING_PARENT);
-		}
+	ret = replmd_extract_la_entry_details(module, la, tmp_ctx, &attr,
+					      &src_msg, &tgt_dsdb_dn);
 
-		if (ret != LDB_SUCCESS) {
-			break;
-		}
+	/*
+	 * When we fail to find the source object, the error code we pass
+	 * back here is really important. It flags back to the callers to
+	 * retry this request with DRSUAPI_DRS_GET_ANC. This case should
+	 * never happen if we're replicating from a Samba DC, but it is
+	 * needed to talk to a Windows DC
+	 */
+	if (ret == LDB_ERR_NO_SUCH_OBJECT) {
+		ret = replmd_replicated_request_werror(ar, WERR_DS_DRA_MISSING_PARENT);
+	}
 
-		ret = replmd_check_target_exists(module, tgt_dsdb_dn, la,
-						 src_msg->dn, &guid, &del_state);
+	if (ret != LDB_SUCCESS) {
+		talloc_free(tmp_ctx);
+		return ret;
+	}
 
-		/*
-		 * When we fail to find the target object, the error code we pass
-		 * back here is really important. It flags back to the callers to
-		 * retry this request with DRSUAPI_DRS_GET_TGT
-		 */
-		if (ret == LDB_ERR_NO_SUCH_OBJECT) {
-			ret = replmd_replicated_request_werror(ar, WERR_DS_DRA_RECYCLED_TARGET);
-		}
+	ret = replmd_check_target_exists(module, tgt_dsdb_dn, la,
+					 src_msg->dn, &guid, &del_state);
 
-		if (ret != LDB_SUCCESS) {
-			break;
-		}
+	/*
+	 * When we fail to find the target object, the error code we pass
+	 * back here is really important. It flags back to the callers to
+	 * retry this request with DRSUAPI_DRS_GET_TGT
+	 */
+	if (ret == LDB_ERR_NO_SUCH_OBJECT) {
+		ret = replmd_replicated_request_werror(ar, WERR_DS_DRA_RECYCLED_TARGET);
 	}
 
 	talloc_free(tmp_ctx);
-- 
2.7.4


From c3805ebbb04f202b13ae7d50d8a36ba3714c6423 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Wed, 19 Jul 2017 16:18:20 +1200
Subject: [PATCH 09/11] replmd: Avoid dropping links if link target is deleted

The server-side can potentially send the linked attribute before the
target-object. This happens on Microsoft, and will happen on Samba once
server-side GET_TGT support is added. In these cases there is a hole
where the Samba client can silently drop the linked attribute.

If the old copy of the target object was deleted/recycled, then the
client can receive the new linked attribute before it realizes the target
has now been reincarnated. It silently ignores the linked attribute,
thinking its receiving out of date information, when really it's the
client's copy of the target object that's out of date.

In this case we want to retry with the GET_TGT flag set, which will
force the updated version of the target object to be sent along with the
linked attribute. This deleted/recycled target case is the main reason
that Windows added the GET_TGT flag.

If the server sends all the links at the end, instead of along with the
source object, then this case can still be hit. If so, it will cause the
server to restart the replication from the beginning again. This is
probably preferential to silently dropping links.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 52 ++++++++++++++++---------
 1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index 0a53d1d..8e2c64b 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -6713,14 +6713,16 @@ static bool replmd_objects_have_same_nc(struct ldb_context *ldb,
 
 /**
  * Checks that the target object for a linked attribute exists.
- * If it exists, its GUID and deletion-state are returned
+ * @param guid returns the target object's GUID (is returned)if it exists)
+ * @param ignore_link set to true if the linked attribute should be ignored
+ * (i.e. the target doesn't exist, but that it's OK to skip the link)
  */
 static int replmd_check_target_exists(struct ldb_module *module,
 				      struct dsdb_dn *dsdb_dn,
 				      struct la_entry *la_entry,
 				      struct ldb_dn *source_dn,
 				      struct GUID *guid,
-				      enum deletion_state *target_deletion_state)
+				      bool *ignore_link)
 {
 	struct drsuapi_DsReplicaLinkedAttribute *la = la_entry->la;
 	struct ldb_context *ldb = ldb_module_get_ctx(module);
@@ -6729,10 +6731,10 @@ static int replmd_check_target_exists(struct ldb_module *module,
 	const char *attrs[] = { "isDeleted", "isRecycled", NULL };
 	NTSTATUS ntstatus;
 	int ret;
+	enum deletion_state target_deletion_state = OBJECT_REMOVED;
 	bool active = (la->flags & DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE) ? true : false;
 
-	*target_deletion_state = OBJECT_REMOVED;
-
+	*ignore_link = false;
 	ntstatus = dsdb_get_extended_dn_guid(dsdb_dn->dn, guid, "GUID");
 
 	if (!NT_STATUS_IS_OK(ntstatus) && !active) {
@@ -6808,6 +6810,7 @@ static int replmd_check_target_exists(struct ldb_module *module,
 				 ": Failed to find target %s linked from %s\n",
 				 ldb_dn_get_linearized(dsdb_dn->dn),
 				 ldb_dn_get_linearized(source_dn)));
+			*ignore_link = true;
 
 		} else if (replmd_objects_have_same_nc(ldb, tmp_ctx, source_dn,
 						       dsdb_dn->dn)) {
@@ -6827,6 +6830,7 @@ static int replmd_check_target_exists(struct ldb_module *module,
 			DEBUG(2,("Failed to resolve cross-partition link between %s and %s\n",
 				 ldb_dn_get_linearized(source_dn),
 				 ldb_dn_get_linearized(dsdb_dn->dn)));
+			*ignore_link = true;
 		}
 	} else if (target_res->count != 1) {
 		ldb_asprintf_errstring(ldb, "More than one object found matching objectGUID %s\n",
@@ -6839,7 +6843,23 @@ static int replmd_check_target_exists(struct ldb_module *module,
 
 		/* Get the object's state (i.e. Not Deleted, Tombstone, etc) */
 		replmd_deletion_state(module, target_msg,
-				      target_deletion_state, NULL);
+				      &target_deletion_state, NULL);
+
+		/*
+		 * Check for deleted objects as per MS-DRSR 4.1.10.6.14
+		 * ProcessLinkValue(). Link updates should not be sent for
+		 * recycled and tombstone objects (deleting the links should
+		 * happen when we delete the object). This probably means our
+		 * copy of the target object isn't up to date.
+		 */
+		if (target_deletion_state >= OBJECT_RECYCLED) {
+			ldb_asprintf_errstring(ldb,
+					       "Deleted target %s GUID %s linked from %s\n",
+					       ldb_dn_get_linearized(dsdb_dn->dn),
+					       GUID_string(tmp_ctx, guid),
+					       ldb_dn_get_linearized(source_dn));
+			ret = LDB_ERR_NO_SUCH_OBJECT;
+		}
 	}
 
 	talloc_free(tmp_ctx);
@@ -6972,8 +6992,8 @@ static int replmd_verify_linked_attribute(struct replmd_replicated_request *ar,
 	struct ldb_message *src_msg;
 	const struct dsdb_attribute *attr;
 	struct dsdb_dn *tgt_dsdb_dn;
-	enum deletion_state del_state = OBJECT_NOT_DELETED;
 	struct GUID guid = GUID_zero();
+	bool dummy;
 
 	ret = replmd_extract_la_entry_details(module, la, tmp_ctx, &attr,
 					      &src_msg, &tgt_dsdb_dn);
@@ -6995,7 +7015,7 @@ static int replmd_verify_linked_attribute(struct replmd_replicated_request *ar,
 	}
 
 	ret = replmd_check_target_exists(module, tgt_dsdb_dn, la,
-					 src_msg->dn, &guid, &del_state);
+					 src_msg->dn, &guid, &dummy);
 
 	/*
 	 * When we fail to find the target object, the error code we pass
@@ -7032,9 +7052,8 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	struct parsed_dn *pdn_list, *pdn, *next;
 	struct GUID guid = GUID_zero();
 	bool active = (la->flags & DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE)?true:false;
-
+	bool ignore_link;
 	enum deletion_state deletion_state = OBJECT_NOT_DELETED;
-	enum deletion_state target_deletion_state = OBJECT_NOT_DELETED;
 
 	/*
 	 * get the attribute being modified, the search result for the source object,
@@ -7049,7 +7068,7 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	}
 
 	/*
-	 * Check for deleted objects per MS-DRSR 4.1.10.6.13
+	 * Check for deleted objects per MS-DRSR 4.1.10.6.14
 	 * ProcessLinkValue, because link updates are not applied to
 	 * recycled and tombstone objects.  We don't have to delete
 	 * any existing link, that should have happened when the
@@ -7084,7 +7103,7 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	}
 
 	ret = replmd_check_target_exists(module, dsdb_dn, la_entry, msg->dn,
-					 &guid, &target_deletion_state);
+					 &guid, &ignore_link);
 
 	if (ret != LDB_SUCCESS) {
 		talloc_free(tmp_ctx);
@@ -7092,15 +7111,12 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	}
 
 	/*
-	 * Check for deleted objects per MS-DRSR 4.1.10.6.13
-	 * ProcessLinkValue, because link updates are not applied to
-	 * recycled and tombstone objects.  We don't have to delete
-	 * any existing link, that should have happened when the
-	 * object deletion was replicated or initiated.
+	 * there are some cases where the target object doesn't exist, but it's
+	 * OK to ignore the linked attribute
 	 */
-	if (target_deletion_state >= OBJECT_RECYCLED) {
+	if (ignore_link) {
 		talloc_free(tmp_ctx);
-		return LDB_SUCCESS;
+		return ret;
 	}
 
 	/* see if this link already exists */
-- 
2.7.4


From 3adb35350ef13479e1fba8787bcb01795308430e Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Thu, 20 Jul 2017 11:14:27 +1200
Subject: [PATCH 10/11] replmd: Don't fail cycle if we get link for deleted
 object with GET_TGT

We are going to end up supporting 2 different server schemes:
A. the old/default behaviour of sending all the linked attributes last,
   at the end of the replication cycle.
B. the new/Microsoft way of sending the linked attributes interleaved
   with the source/target objects.

Normally if we're talking to a server using the old scheme-A, we won't
ever use the GET_TGT flag. However, there are a couple of cases where
it can happen:
- A link to a new object was added during the replication cycle.
- An object was deleted while the replication was in progress (and
the linked attribute got queued before the object was deleted).

Talking to an Samba DC running the old scheme will just cause it to
start the replication cycle from scratch again, which is fairly
harmless. However, there is a chance that the same thing can happen
again, in which case the replication cycle will fail (because GET_TGT
was already set).

Even if we're using the new scheme (B), we could still potentially hit
this case, as we can still queue up linked attributes between requests
(group memberships can be larger than what can fit into a single
replication chunk).

If GET_TGT is set in the GetNcChanges request, then the local copy of
the target object should always be up-to-date when we process the linked
attribute. So if we still think the target object is deleted/recycled at
this point, then it's safe to ignore the linked attribute (because we
know our local copy is up-to-date). This logic matches the MS spec logic
in ProcessLinkValue().

Not failing the replication cycle may be beneficial if we're trying to
do a full-sync of a large database. Otherwise it might be time-consuming
and frustrating to repeat the sync unnecessarily.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/repl/drepl_out_helpers.c           |  4 ++
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 51 +++++++++++++++----------
 source4/dsdb/samdb/samdb.h                      |  1 +
 source4/libnet/libnet_vampire.c                 |  4 ++
 4 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/source4/dsdb/repl/drepl_out_helpers.c b/source4/dsdb/repl/drepl_out_helpers.c
index 6e24e7f..6630804 100644
--- a/source4/dsdb/repl/drepl_out_helpers.c
+++ b/source4/dsdb/repl/drepl_out_helpers.c
@@ -868,6 +868,10 @@ static void dreplsrv_op_pull_source_apply_changes_trigger(struct tevent_req *req
 		dsdb_repl_flags |= DSDB_REPL_FLAG_OBJECT_SUBSET;
 	}
 
+	if (state->op->more_flags & DRSUAPI_DRS_GET_TGT) {
+		dsdb_repl_flags |= DSDB_REPL_FLAG_TARGETS_UPTODATE;
+	}
+
 	if (state->op->extended_op != DRSUAPI_EXOP_NONE) {
 		ret = dsdb_find_nc_root(service->samdb, partition,
 					partition->dn, &nc_root);
diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index 8e2c64b..228622c 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -74,7 +74,7 @@ struct replmd_private {
 struct la_entry {
 	struct la_entry *next, *prev;
 	struct drsuapi_DsReplicaLinkedAttribute *la;
-	bool incomplete_replica;
+	uint32_t dsdb_repl_flags;
 };
 
 struct replmd_replicated_request {
@@ -6052,14 +6052,12 @@ static int replmd_store_linked_attributes(struct replmd_replicated_request *ar)
 {
 	int ret = LDB_SUCCESS;
 	uint32_t i;
-	bool incomplete_subset;
 	struct ldb_module *module = ar->module;
 	struct replmd_private *replmd_private =
 		talloc_get_type(ldb_module_get_private(module), struct replmd_private);
 	struct ldb_context *ldb;
 
 	ldb = ldb_module_get_ctx(module);
-	incomplete_subset = (ar->objs->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET);
 
 	DEBUG(4,("linked_attributes_count=%u\n", ar->objs->linked_attributes_count));
 
@@ -6082,13 +6080,7 @@ static int replmd_store_linked_attributes(struct replmd_replicated_request *ar)
 			return LDB_ERR_OPERATIONS_ERROR;
 		}
 		*la_entry->la = ar->objs->linked_attributes[i];
-
-		/*
-		 * we may not be able to resolve link targets properly when
-		 * dealing with subsets of objects, e.g. the source is a
-		 * critical object and the target isn't
-		 */
-		la_entry->incomplete_replica = incomplete_subset;
+		la_entry->dsdb_repl_flags = ar->objs->dsdb_repl_flags;
 
 		/* we need to steal the non-scalars so they stay
 		   around until the end of the transaction */
@@ -6793,17 +6785,19 @@ static int replmd_check_target_exists(struct ldb_module *module,
 	if (target_res->count == 0) {
 
 		/*
+		 * we may not be able to resolve link targets properly when
+		 * dealing with subsets of objects, e.g. the source is a
+		 * critical object and the target isn't
+		 *
 		 * TODO:
 		 * When we implement Trusted Domains we need to consider
 		 * whether they get treated as an incomplete replica here or not
 		 */
-		if (la_entry->incomplete_replica) {
+		if (la_entry->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET) {
 
 			/*
-			 * If we're only replicating a subset of objects (e.g.
-			 * critical-only, single-object), then an unknown target
-			 * is probably not a critical problem. We don't increase
-			 * the highwater-mark so subsequent replications should
+			 * We don't increase the highwater-mark in the object
+			 * subset cases, so subsequent replications should
 			 * resolve any missing links
 			 */
 			DEBUG(2,(__location__
@@ -6853,12 +6847,27 @@ static int replmd_check_target_exists(struct ldb_module *module,
 		 * copy of the target object isn't up to date.
 		 */
 		if (target_deletion_state >= OBJECT_RECYCLED) {
-			ldb_asprintf_errstring(ldb,
-					       "Deleted target %s GUID %s linked from %s\n",
-					       ldb_dn_get_linearized(dsdb_dn->dn),
-					       GUID_string(tmp_ctx, guid),
-					       ldb_dn_get_linearized(source_dn));
-			ret = LDB_ERR_NO_SUCH_OBJECT;
+
+			if (la_entry->dsdb_repl_flags & DSDB_REPL_FLAG_TARGETS_UPTODATE) {
+
+				/*
+				 * target should already be uptodate so there's no
+				 * point retrying - it's probably just bad timing
+				 */
+				*ignore_link = true;
+				DEBUG(0, ("%s is deleted but up to date. "
+					  "Ignoring link from %s\n",
+					  ldb_dn_get_linearized(dsdb_dn->dn),
+					  ldb_dn_get_linearized(source_dn)));
+
+			} else {
+				ldb_asprintf_errstring(ldb,
+						       "Deleted target %s GUID %s linked from %s",
+						       ldb_dn_get_linearized(dsdb_dn->dn),
+						       GUID_string(tmp_ctx, guid),
+						       ldb_dn_get_linearized(source_dn));
+				ret = LDB_ERR_NO_SUCH_OBJECT;
+			}
 		}
 	}
 
diff --git a/source4/dsdb/samdb/samdb.h b/source4/dsdb/samdb/samdb.h
index 7d0127b..fcb2148 100644
--- a/source4/dsdb/samdb/samdb.h
+++ b/source4/dsdb/samdb/samdb.h
@@ -65,6 +65,7 @@ struct dsdb_control_current_partition {
 #define DSDB_REPL_FLAG_ADD_NCNAME	   4
 #define DSDB_REPL_FLAG_EXPECT_NO_SECRETS   8
 #define DSDB_REPL_FLAG_OBJECT_SUBSET       16
+#define DSDB_REPL_FLAG_TARGETS_UPTODATE    32
 
 #define DSDB_CONTROL_REPLICATED_UPDATE_OID "1.3.6.1.4.1.7165.4.3.3"
 
diff --git a/source4/libnet/libnet_vampire.c b/source4/libnet/libnet_vampire.c
index d89256b..aa0ec91 100644
--- a/source4/libnet/libnet_vampire.c
+++ b/source4/libnet/libnet_vampire.c
@@ -647,6 +647,10 @@ WERROR libnet_vampire_cb_store_chunk(void *private_data,
 			is_exop = true;
 		}
 		req_replica_flags = c->req10->replica_flags;
+
+		if (c->req10->more_flags & DRSUAPI_DRS_GET_TGT) {
+			dsdb_repl_flags |= DSDB_REPL_FLAG_TARGETS_UPTODATE;
+		}
 		break;
 	default:
 		return WERR_INVALID_PARAMETER;
-- 
2.7.4


From c12cf888727093e0f639f2a2c6415a8317d80f87 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Mon, 24 Jul 2017 16:20:58 +1200
Subject: [PATCH 11/11] replmd: Try to add forward-link for unknown
 cross-partition links

Previously Samba would just drop cross-partition links where the link
target object is unknown. Instead, what we want to do is try to add the
forward link for the GUID specified. We can't add the backlink because
we don't know the target, however, dbcheck should be able to fix any
missing backlinks.

The new behaviour should now mean dbcheck will detect the problem and be
able to fix it. It's still not ideal, but it's better than dropping the
link completely.

I've updated the log so that it has higher severity and tells the user
what they need to do to fix it.

These changes now mean that the selftests now detect an error - instead
of completely dropping the serverReference, we now have a missing
backlink. I've updated the selftests to fix up any missing
serverReference backlinks before running dbcheck.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 14 +++++++-------
 testprogs/blackbox/dbcheck.sh                   |  8 ++++++++
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index 228622c..d5234d5 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -374,8 +374,9 @@ static int replmd_process_backlink(struct ldb_module *module, struct la_backlink
 	ret = dsdb_module_dn_by_guid(module, frame, &bl->target_guid, &target_dn, parent);
 	if (ret != LDB_SUCCESS) {
 		struct GUID_txt_buf guid_str;
-		DEBUG(2,(__location__ ": WARNING: Failed to find target DN for linked attribute with GUID %s\n",
-			 GUID_buf_string(&bl->target_guid, &guid_str)));
+		DBG_WARNING("Failed to find target DN for linked attribute with GUID %s\n",
+			    GUID_buf_string(&bl->target_guid, &guid_str));
+		DBG_WARNING("Please run 'samba-tool dbcheck' to resolve any missing backlinks.\n");
 		talloc_free(frame);
 		return LDB_SUCCESS;
 	}
@@ -6816,15 +6817,14 @@ static int replmd_check_target_exists(struct ldb_module *module,
 		} else {
 
 			/*
-			 * TODO:
-			 * We don't handle cross-partition links well here (we
-			 * could potentially lose them), but don't fail the
-			 * replication.
+			 * The target of the cross-partition link is missing.
+			 * Continue and try to at least add the forward-link.
+			 * This isn't great, but if we can add a partial link
+			 * then it's better than nothing.
 			 */
 			DEBUG(2,("Failed to resolve cross-partition link between %s and %s\n",
 				 ldb_dn_get_linearized(source_dn),
 				 ldb_dn_get_linearized(dsdb_dn->dn)));
-			*ignore_link = true;
 		}
 	} else if (target_res->count != 1) {
 		ldb_asprintf_errstring(ldb, "More than one object found matching objectGUID %s\n",
diff --git a/testprogs/blackbox/dbcheck.sh b/testprogs/blackbox/dbcheck.sh
index 0f979ab..387ce70 100755
--- a/testprogs/blackbox/dbcheck.sh
+++ b/testprogs/blackbox/dbcheck.sh
@@ -27,6 +27,13 @@ dbcheck_fix_stale_links() {
 	$BINDIR/samba-tool dbcheck --quiet --fix --yes remove_plausible_deleted_DN_links --attrs="member msDS-NC-Replica-Locations msDS-NC-RO-Replica-Locations" --cross-ncs $ARGS
 }
 
+# This list of attributes can be freely extended
+dbcheck_fix_crosspartition_backlinks() {
+	# we may not know the target yet when we receive a cross-partition link,
+	# which can result in a missing backlink
+	$BINDIR/samba-tool dbcheck --quiet --fix --yes fix_all_missing_backlinks --attrs="serverReference" --cross-ncs $ARGS
+}
+
 # This test shows that this does not do anything to a current
 # provision (that would be a bug)
 dbcheck_reset_well_known_acls() {
@@ -47,6 +54,7 @@ force_modules() {
 
 dbcheck_fix_one_way_links
 dbcheck_fix_stale_links
+dbcheck_fix_crosspartition_backlinks
 testit "dbcheck" dbcheck
 testit "reindex" reindex
 testit "fixed_attrs" fixed_attrs
-- 
2.7.4



More information about the samba-technical mailing list