[PATCH] DRS improvements and GET_TGT support

Tim Beale timbeale at catalyst.net.nz
Tue Aug 15 05:55:57 UTC 2017


Hi,

The changes for GET_TGT support are attached, or available here:
http://git.catalyst.net.nz/gw?p=samba.git;a=shortlog;h=refs/heads/tim-get-tgt

Note that Microsoft uses phantom objects to resolve unknown link targets
received during replication. Microsoft appears to only use the GET_TGT
in the case where the target object is deleted/recycled. More details on
this are here:
https://blogs.technet.microsoft.com/askpfeplat/2012/07/22/mcm-core-active-directory-internals/

Adding support for phantom objects to Samba was more work than was
feasible. However, in Samba we can use the GET_TGT protocol mechanism to
resolve unknown link targets. This means the server-side can send the
linked attributes in the same replication chunk as the source object.
This new behaviour is off by default (because talking to older Samba
clients that don't support GET_TGT will break). There's a new 'drs:
immediate link sync' config option to enable it.

This new option should improve doing a 'samba-tool drs replicate --local
--full-sync'. Instead of the DB being locked while back-to-back blocks
of 1500 links are applied at the end of the replication, these links
will now be more distributed throughout the replication cycle.

As part of this work I looked for cases where linked attributes could
potentially be dropped. The only case I could find was linked attributes
that span a partition. Instead of dropping the cross-partition link
completely if the target is unknown, we now add a one-way link. Running
'samba-tool dbcheck' will detect any such links and be able to fix them.

I've added new replication tests. As well as testing the new GET_TGT
functionality, they assert that no objects/links are missing,
particularly when the replicated data spans several chunks and objects
are being modified on the server-side while the replication is in progress.

Let me know if you've got any questions on this work.

The attached patch-set is essentially the same as what I last posted,
except:
- The top 4 patches are new.
- Garming split one patch (getncchanges.c: Support GET_TGT better with
large numbers of links) into two.
- I've combined the server-side and client-side patch-sets instead of
keeping them separate.

Cheers,
Tim
-------------- next part --------------
From e20132689e8d6533c1e5d1525ef3e4d8f7109fed Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Tue, 30 May 2017 15:08:44 +1200
Subject: [PATCH 01/36] werror: Add WERR_DS_DRA_RECYCLED_TARGET

When the DRS client encounters a linked attribute with an unknown target
object, it should return a RECYCLED_TARGET error, which should result in
the client resending the GETNCChanges request with the GET_TGT flag set.

This error code is currently documented by Microsoft under System Error
Codes (8200-8999). I contacted them and they will also add it to the
MS-ERREF doc in future.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 libcli/util/werror.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libcli/util/werror.h b/libcli/util/werror.h
index d3d3327..0370a06 100644
--- a/libcli/util/werror.h
+++ b/libcli/util/werror.h
@@ -100,6 +100,7 @@ typedef uint32_t WERROR;
 #define WERR_INVALID_PRIMARY_GROUP      W_ERROR(0x0000051C)
 
 #define WERR_DS_DRA_SECRETS_DENIED			W_ERROR(0x000021B6)
+#define WERR_DS_DRA_RECYCLED_TARGET			W_ERROR(0x000021BF)
 
 #define WERR_DNS_ERROR_KEYMASTER_REQUIRED               W_ERROR(0x0000238D)
 #define WERR_DNS_ERROR_NOT_ALLOWED_ON_SIGNED_ZONE       W_ERROR(0x0000238E)
-- 
2.7.4


From 9f564966ad0e45bc9141a2c340b4ca8f831f90f3 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Wed, 14 Jun 2017 10:51:59 +1200
Subject: [PATCH 02/36] replmd: Split checking link attr target into new
 function

We want to re-use this code to check that the linked attribute's target
object exists *before* we try to commit the transaction. This will allow
us to re-request the block with the GET_TGT flag set.

This splits checking the target object exists into a separate function.

Minor changes of note:
- the 'parent' argument was passed to replmd_process_linked_attribute()
  as NULL, so I've just replaced where it was used in the refactored code
  with NULL.
- I've tweaked the "Failed to find GUID" error message slightly to display
  the attribute ID rather than the attribute name (saves repeating
  lookups and/or passing extra arguments).
- Tweaked the replmd_deletion_state() logic - it only made sense to call
  it in the code block where we actually found the target

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 171 ++++++++++++++----------
 1 file changed, 102 insertions(+), 69 deletions(-)

diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index 0ed24cb..53e2c12 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -6627,6 +6627,105 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 	return replmd_replicated_apply_next(ar);
 }
 
+/**
+ * Checks that the target object for a linked attribute exists.
+ * If it exists, its GUID and deletion-state are returned
+ */
+static int replmd_check_target_exists(struct ldb_module *module,
+				      struct dsdb_dn *dsdb_dn,
+				      struct la_entry *la_entry,
+				      struct ldb_dn *source_dn,
+				      struct GUID *guid,
+				      enum deletion_state *target_deletion_state)
+{
+	struct drsuapi_DsReplicaLinkedAttribute *la = la_entry->la;
+	struct ldb_context *ldb = ldb_module_get_ctx(module);
+	struct ldb_result *target_res;
+	TALLOC_CTX *tmp_ctx = talloc_new(la_entry);
+	const char *attrs[] = { "isDeleted", "isRecycled", NULL };
+	NTSTATUS ntstatus;
+	int ret;
+	bool active = (la->flags & DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE) ? true : false;
+
+	*target_deletion_state = OBJECT_REMOVED;
+
+	ntstatus = dsdb_get_extended_dn_guid(dsdb_dn->dn, guid, "GUID");
+
+	if (!NT_STATUS_IS_OK(ntstatus) && !active) {
+
+		/*
+		 * This strange behaviour (allowing a NULL/missing
+		 * GUID) originally comes from:
+		 *
+		 * commit e3054ce0fe0f8f62d2f5b2a77893e7a1479128bd
+		 * Author: Andrew Tridgell <tridge at samba.org>
+		 * Date:   Mon Dec 21 21:21:55 2009 +1100
+		 *
+		 *  s4-drs: cope better with NULL GUIDS from DRS
+		 *
+		 *  It is valid to get a NULL GUID over DRS for a deleted forward link. We
+		 *  need to match by DN if possible when seeing if we should update an
+		 *  existing link.
+		 *
+		 *  Pair-Programmed-With: Andrew Bartlett <abartlet at samba.org>
+		 */
+		ret = dsdb_module_search_dn(module, tmp_ctx, &target_res,
+					    dsdb_dn->dn, attrs,
+					    DSDB_FLAG_NEXT_MODULE |
+					    DSDB_SEARCH_SHOW_RECYCLED |
+					    DSDB_SEARCH_SEARCH_ALL_PARTITIONS |
+					    DSDB_SEARCH_SHOW_DN_IN_STORAGE_FORMAT,
+					    NULL);
+	} else if (!NT_STATUS_IS_OK(ntstatus)) {
+		ldb_asprintf_errstring(ldb, "Failed to find GUID in linked attribute 0x%x blob for %s from %s",
+				       la->attid,
+				       ldb_dn_get_linearized(dsdb_dn->dn),
+				       ldb_dn_get_linearized(source_dn));
+		talloc_free(tmp_ctx);
+		return LDB_ERR_OPERATIONS_ERROR;
+	} else {
+		ret = dsdb_module_search(module, tmp_ctx, &target_res,
+					 NULL, LDB_SCOPE_SUBTREE,
+					 attrs,
+					 DSDB_FLAG_NEXT_MODULE |
+					 DSDB_SEARCH_SHOW_RECYCLED |
+					 DSDB_SEARCH_SEARCH_ALL_PARTITIONS |
+					 DSDB_SEARCH_SHOW_DN_IN_STORAGE_FORMAT,
+					 NULL,
+					 "objectGUID=%s",
+					 GUID_string(tmp_ctx, guid));
+	}
+
+	if (ret != LDB_SUCCESS) {
+		ldb_asprintf_errstring(ldb, "Failed to re-resolve GUID %s: %s\n",
+				       GUID_string(tmp_ctx, guid),
+				       ldb_errstring(ldb));
+		talloc_free(tmp_ctx);
+		return ret;
+	}
+
+	if (target_res->count == 0) {
+		DEBUG(2,(__location__ ": WARNING: Failed to re-resolve GUID %s - using %s\n",
+			 GUID_string(tmp_ctx, guid),
+			 ldb_dn_get_linearized(dsdb_dn->dn)));
+	} else if (target_res->count != 1) {
+		ldb_asprintf_errstring(ldb, "More than one object found matching objectGUID %s\n",
+				       GUID_string(tmp_ctx, guid));
+		ret = LDB_ERR_OPERATIONS_ERROR;
+	} else {
+		struct ldb_message *target_msg = target_res->msgs[0];
+
+		dsdb_dn->dn = talloc_steal(dsdb_dn, target_msg->dn);
+
+		/* Get the object's state (i.e. Not Deleted, Tombstone, etc) */
+		replmd_deletion_state(module, target_msg,
+				      target_deletion_state, NULL);
+	}
+
+	talloc_free(tmp_ctx);
+	return ret;
+}
+
 /*
   process one linked attribute structure
  */
@@ -6638,7 +6737,6 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	struct drsuapi_DsReplicaLinkedAttribute *la = la_entry->la;
 	struct ldb_context *ldb = ldb_module_get_ctx(module);
 	struct ldb_message *msg;
-	struct ldb_message *target_msg = NULL;
 	TALLOC_CTX *tmp_ctx = talloc_new(la_entry);
 	const struct dsdb_schema *schema = dsdb_get_schema(ldb, tmp_ctx);
 	int ret;
@@ -6649,17 +6747,15 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	WERROR status;
 	time_t t = time(NULL);
 	struct ldb_result *res;
-	struct ldb_result *target_res;
 	const char *attrs[4];
-	const char *attrs2[] = { "isDeleted", "isRecycled", NULL };
 	struct parsed_dn *pdn_list, *pdn, *next;
 	struct GUID guid = GUID_zero();
-	NTSTATUS ntstatus;
 	bool active = (la->flags & DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE)?true:false;
 
 	enum deletion_state deletion_state = OBJECT_NOT_DELETED;
 	enum deletion_state target_deletion_state = OBJECT_NOT_DELETED;
 
+
 /*
 linked_attributes[0]:
      &objs->linked_attributes[i]: struct drsuapi_DsReplicaLinkedAttribute
@@ -6779,74 +6875,14 @@ linked_attributes[0]:
 		return LDB_ERR_OPERATIONS_ERROR;
 	}
 
-	ntstatus = dsdb_get_extended_dn_guid(dsdb_dn->dn, &guid, "GUID");
-	if (!NT_STATUS_IS_OK(ntstatus) && !active) {
-		/*
-		 * This strange behaviour (allowing a NULL/missing
-		 * GUID) originally comes from:
-		 *
-		 * commit e3054ce0fe0f8f62d2f5b2a77893e7a1479128bd
-		 * Author: Andrew Tridgell <tridge at samba.org>
-		 * Date:   Mon Dec 21 21:21:55 2009 +1100
-		 *
-		 *  s4-drs: cope better with NULL GUIDS from DRS
-		 *
-		 *  It is valid to get a NULL GUID over DRS for a deleted forward link. We
-		 *  need to match by DN if possible when seeing if we should update an
-		 *  existing link.
-		 *
-		 *  Pair-Programmed-With: Andrew Bartlett <abartlet at samba.org>
-		 */
-
-		ret = dsdb_module_search_dn(module, tmp_ctx, &target_res,
-					    dsdb_dn->dn, attrs2,
-					    DSDB_FLAG_NEXT_MODULE |
-					    DSDB_SEARCH_SHOW_RECYCLED |
-					    DSDB_SEARCH_SEARCH_ALL_PARTITIONS |
-					    DSDB_SEARCH_SHOW_DN_IN_STORAGE_FORMAT,
-					    parent);
-	} else if (!NT_STATUS_IS_OK(ntstatus)) {
-		ldb_asprintf_errstring(ldb, "Failed to find GUID in linked attribute blob for %s on %s from %s",
-				       old_el->name,
-				       ldb_dn_get_linearized(dsdb_dn->dn),
-				       ldb_dn_get_linearized(msg->dn));
-		talloc_free(tmp_ctx);
-		return LDB_ERR_OPERATIONS_ERROR;
-	} else {
-		ret = dsdb_module_search(module, tmp_ctx, &target_res,
-					 NULL, LDB_SCOPE_SUBTREE,
-					 attrs2,
-					 DSDB_FLAG_NEXT_MODULE |
-					 DSDB_SEARCH_SHOW_RECYCLED |
-					 DSDB_SEARCH_SEARCH_ALL_PARTITIONS |
-					 DSDB_SEARCH_SHOW_DN_IN_STORAGE_FORMAT,
-					 parent,
-					 "objectGUID=%s",
-					 GUID_string(tmp_ctx, &guid));
-	}
+	ret = replmd_check_target_exists(module, dsdb_dn, la_entry, msg->dn,
+					 &guid, &target_deletion_state);
 
 	if (ret != LDB_SUCCESS) {
-		ldb_asprintf_errstring(ldb_module_get_ctx(module), "Failed to re-resolve GUID %s: %s\n",
-				       GUID_string(tmp_ctx, &guid),
-				       ldb_errstring(ldb_module_get_ctx(module)));
 		talloc_free(tmp_ctx);
 		return ret;
 	}
 
-	if (target_res->count == 0) {
-		DEBUG(2,(__location__ ": WARNING: Failed to re-resolve GUID %s - using %s\n",
-			 GUID_string(tmp_ctx, &guid),
-			 ldb_dn_get_linearized(dsdb_dn->dn)));
-	} else if (target_res->count != 1) {
-		ldb_asprintf_errstring(ldb_module_get_ctx(module), "More than one object found matching objectGUID %s\n",
-				       GUID_string(tmp_ctx, &guid));
-		talloc_free(tmp_ctx);
-		return LDB_ERR_OPERATIONS_ERROR;
-	} else {
-		target_msg = target_res->msgs[0];
-		dsdb_dn->dn = talloc_steal(dsdb_dn, target_msg->dn);
-	}
-
 	/*
 	 * Check for deleted objects per MS-DRSR 4.1.10.6.13
 	 * ProcessLinkValue, because link updates are not applied to
@@ -6854,9 +6890,6 @@ linked_attributes[0]:
 	 * any existing link, that should have happened when the
 	 * object deletion was replicated or initiated.
 	 */
-	replmd_deletion_state(module, target_msg,
-			      &target_deletion_state, NULL);
-
 	if (target_deletion_state >= OBJECT_RECYCLED) {
 		talloc_free(tmp_ctx);
 		return LDB_SUCCESS;
-- 
2.7.4


From 591ed3fd5be2a59a0f172a3e5ad303eaff09bdb6 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Wed, 14 Jun 2017 11:35:36 +1200
Subject: [PATCH 03/36] drs: Fail replication transaction instead of dropping
 links

If the DRS client received a linked attribute that it couldn't resolve
the target for, then it would just ignore that link and keep going. That
link would then be lost forever (although a full-sync would resolve
this). Instead of silently ignoring the link, fail the transaction.

This *can* happen on Samba, but it is unusual. The target object and
linked-attribute would need to be added while a replication is still in
progress. It can also happen fairly easily when talking to a Windows DC.

There are two import exceptions to this:

1). Linked attributes that span partitions. We can never guarantee that
we will have received the target object, because it may be in a partition
we haven't replicated yet. Samba doesn't have a great way of handling
this currently, but we shouldn't fail the replication (because that breaks
basic join tests). Just skip that linked attribute and hope that a
subsequent full-sync will fix it.
(I queried Microsoft and they said resolving cross-partition linked
attributes is a implementation-specific problem to solve. GET_TGT won't
resolve it)

2). When the replication involves a subset of objects, e.g.
critical-only. In these cases, we don't increase the highwater-mark, so
it is probably not such a dire problem if we don't add the link. In the
case of critical-only, we will do a subsequent full sync which will then
add the links.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/repl/drepl_out_helpers.c           |  4 ++
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 92 ++++++++++++++++++++++++-
 source4/dsdb/samdb/samdb.h                      |  2 +-
 source4/libnet/libnet_vampire.c                 |  1 +
 4 files changed, 95 insertions(+), 4 deletions(-)

diff --git a/source4/dsdb/repl/drepl_out_helpers.c b/source4/dsdb/repl/drepl_out_helpers.c
index 079edc8..ad4801a 100644
--- a/source4/dsdb/repl/drepl_out_helpers.c
+++ b/source4/dsdb/repl/drepl_out_helpers.c
@@ -795,6 +795,10 @@ static void dreplsrv_op_pull_source_apply_changes_trigger(struct tevent_req *req
 	if (state->op->options & DRSUAPI_DRS_SPECIAL_SECRET_PROCESSING) {
 		dsdb_repl_flags |= DSDB_REPL_FLAG_EXPECT_NO_SECRETS;
 	}
+	if (state->op->options & DRSUAPI_DRS_CRITICAL_ONLY ||
+	    state->op->extended_op != DRSUAPI_EXOP_NONE) {
+		dsdb_repl_flags |= DSDB_REPL_FLAG_OBJECT_SUBSET;
+	}
 
 	if (state->op->extended_op != DRSUAPI_EXOP_NONE) {
 		ret = dsdb_find_nc_root(service->samdb, partition,
diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index 53e2c12..dc16674 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -74,6 +74,7 @@ struct replmd_private {
 struct la_entry {
 	struct la_entry *next, *prev;
 	struct drsuapi_DsReplicaLinkedAttribute *la;
+	bool incomplete_replica;
 };
 
 struct replmd_replicated_request {
@@ -6535,6 +6536,7 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 	struct ldb_control **ctrls;
 	int ret;
 	uint32_t i;
+	bool incomplete_subset;
 	struct replmd_private *replmd_private =
 		talloc_get_type(ldb_module_get_private(module), struct replmd_private);
 
@@ -6592,6 +6594,7 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 
 	ar->controls = req->controls;
 	req->controls = ctrls;
+	incomplete_subset = (ar->objs->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET);
 
 	DEBUG(4,("linked_attributes_count=%u\n", objs->linked_attributes_count));
 
@@ -6616,6 +6619,13 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 		}
 		*la_entry->la = ar->objs->linked_attributes[i];
 
+		/*
+		 * we may not be able to resolve link targets properly when
+		 * dealing with subsets of objects, e.g. the source is a
+		 * critical object and the target isn't
+		 */
+		la_entry->incomplete_replica = incomplete_subset;
+
 		/* we need to steal the non-scalars so they stay
 		   around until the end of the transaction */
 		talloc_steal(la_entry->la, la_entry->la->identifier);
@@ -6628,6 +6638,46 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 }
 
 /**
+ * Returns True if the source and target DNs both have the same naming context,
+ * i.e. they're both in the same partition.
+ */
+static bool replmd_objects_have_same_nc(struct ldb_context *ldb,
+					TALLOC_CTX *mem_ctx,
+					struct ldb_dn *source_dn,
+					struct ldb_dn *target_dn)
+{
+	TALLOC_CTX *tmp_ctx;
+	struct ldb_dn *source_nc;
+	struct ldb_dn *target_nc;
+	int ret;
+	bool same_nc = true;
+
+	tmp_ctx = talloc_new(mem_ctx);
+
+	ret = dsdb_find_nc_root(ldb, tmp_ctx, source_dn, &source_nc);
+	if (ret != LDB_SUCCESS) {
+		DBG_ERR("Failed to find base DN for source %s\n",
+			ldb_dn_get_linearized(source_dn));
+		talloc_free(tmp_ctx);
+		return true;
+	}
+
+	ret = dsdb_find_nc_root(ldb, tmp_ctx, target_dn, &target_nc);
+	if (ret != LDB_SUCCESS) {
+		DBG_ERR("Failed to find base DN for target %s\n",
+			ldb_dn_get_linearized(target_dn));
+		talloc_free(tmp_ctx);
+		return true;
+	}
+
+	same_nc = (ldb_dn_compare(source_nc, target_nc) == 0);
+
+	talloc_free(tmp_ctx);
+
+	return same_nc;
+}
+
+/**
  * Checks that the target object for a linked attribute exists.
  * If it exists, its GUID and deletion-state are returned
  */
@@ -6705,9 +6755,45 @@ static int replmd_check_target_exists(struct ldb_module *module,
 	}
 
 	if (target_res->count == 0) {
-		DEBUG(2,(__location__ ": WARNING: Failed to re-resolve GUID %s - using %s\n",
-			 GUID_string(tmp_ctx, guid),
-			 ldb_dn_get_linearized(dsdb_dn->dn)));
+
+		/*
+		 * TODO:
+		 * When we implement Trusted Domains we need to consider
+		 * whether they get treated as an incomplete replica here or not
+		 */
+		if (la_entry->incomplete_replica) {
+
+			/*
+			 * If we're only replicating a subset of objects (e.g.
+			 * critical-only, single-object), then an unknown target
+			 * is probably not a critical problem. We don't increase
+			 * the highwater-mark so subsequent replications should
+			 * resolve any missing links
+			 */
+			DEBUG(2,(__location__
+				 ": Failed to find target %s linked from %s\n",
+				 ldb_dn_get_linearized(dsdb_dn->dn),
+				 ldb_dn_get_linearized(source_dn)));
+
+		} else if (replmd_objects_have_same_nc(ldb, tmp_ctx, source_dn,
+						       dsdb_dn->dn)) {
+			ldb_asprintf_errstring(ldb, "Unknown target %s GUID %s linked from %s\n",
+					       ldb_dn_get_linearized(dsdb_dn->dn),
+					       GUID_string(tmp_ctx, guid),
+					       ldb_dn_get_linearized(source_dn));
+			ret = LDB_ERR_NO_SUCH_OBJECT;
+		} else {
+
+			/*
+			 * TODO:
+			 * We don't handle cross-partition links well here (we
+			 * could potentially lose them), but don't fail the
+			 * replication.
+			 */
+			DEBUG(2,("Failed to resolve cross-partition link between %s and %s\n",
+				 ldb_dn_get_linearized(source_dn),
+				 ldb_dn_get_linearized(dsdb_dn->dn)));
+		}
 	} else if (target_res->count != 1) {
 		ldb_asprintf_errstring(ldb, "More than one object found matching objectGUID %s\n",
 				       GUID_string(tmp_ctx, guid));
diff --git a/source4/dsdb/samdb/samdb.h b/source4/dsdb/samdb/samdb.h
index c8658dc..2abaf4a 100644
--- a/source4/dsdb/samdb/samdb.h
+++ b/source4/dsdb/samdb/samdb.h
@@ -64,7 +64,7 @@ struct dsdb_control_current_partition {
 #define DSDB_REPL_FLAG_PARTIAL_REPLICA     2
 #define DSDB_REPL_FLAG_ADD_NCNAME	   4
 #define DSDB_REPL_FLAG_EXPECT_NO_SECRETS   8
-
+#define DSDB_REPL_FLAG_OBJECT_SUBSET       16
 
 #define DSDB_CONTROL_REPLICATED_UPDATE_OID "1.3.6.1.4.1.7165.4.3.3"
 
diff --git a/source4/libnet/libnet_vampire.c b/source4/libnet/libnet_vampire.c
index 7f25a3a..d89256b 100644
--- a/source4/libnet/libnet_vampire.c
+++ b/source4/libnet/libnet_vampire.c
@@ -660,6 +660,7 @@ WERROR libnet_vampire_cb_store_chunk(void *private_data,
 		 */
 		ZERO_STRUCT(s_dsa->highwatermark);
 		uptodateness_vector = NULL;
+		dsdb_repl_flags |= DSDB_REPL_FLAG_OBJECT_SUBSET;
 	}
 
 	/* TODO: avoid hardcoded flags */
-- 
2.7.4


From ba528e87db6e10e6e4fc8955a582775b88f9adcf Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Thu, 15 Jun 2017 14:09:27 +1200
Subject: [PATCH 04/36] drs: Check target object is known after applying
 objects

Currently we only check that the target object is known at the end of
the transaction (i.e. the .prepare_commit hook). It's too late at this
point to resend the request with GET_TGT. Move this processing earlier
on, after we've applied all the objects (i.e. off the .extended hook).

In reality, we need to perform the checks at both points. I've
split the common code that gets the source/target details out of the
la_entry into a helper function. It's not the greatest function ever,
but seemed to make more sense than duplicating the code.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/repl/replicated_objects.c          |  11 +-
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 178 ++++++++++++++++++------
 2 files changed, 145 insertions(+), 44 deletions(-)

diff --git a/source4/dsdb/repl/replicated_objects.c b/source4/dsdb/repl/replicated_objects.c
index 862fafa..dd84570 100644
--- a/source4/dsdb/repl/replicated_objects.c
+++ b/source4/dsdb/repl/replicated_objects.c
@@ -884,12 +884,15 @@ WERROR dsdb_replicated_objects_commit(struct ldb_context *ldb,
 			dsdb_reference_schema(ldb, cur_schema, false);
 		}
 
-		if (!W_ERROR_EQUAL(objects->error, WERR_DS_DRA_MISSING_PARENT)) {
-			DEBUG(1,("Failed to apply records: %s: %s\n",
-				 ldb_errstring(ldb), ldb_strerror(ret)));
-		} else {
+		if (W_ERROR_EQUAL(objects->error, WERR_DS_DRA_RECYCLED_TARGET)) {
+			DEBUG(3,("Missing target while attempting to apply records: %s\n",
+				 ldb_errstring(ldb)));
+		} else if (W_ERROR_EQUAL(objects->error, WERR_DS_DRA_MISSING_PARENT)) {
 			DEBUG(3,("Missing parent while attempting to apply records: %s\n",
 				 ldb_errstring(ldb)));
+		} else {
+			DEBUG(1,("Failed to apply records: %s: %s\n",
+				 ldb_errstring(ldb), ldb_strerror(ret)));
 		}
 		ldb_transaction_cancel(ldb);
 		TALLOC_FREE(tmp_ctx);
diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index dc16674..c04358f 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -114,6 +114,7 @@ static int replmd_check_upgrade_links(struct ldb_context *ldb,
 				      struct parsed_dn *dns, uint32_t count,
 				      struct ldb_message_element *el,
 				      const char *ldap_oid);
+static int replmd_verify_linked_attributes(struct replmd_replicated_request *ar);
 
 enum urgent_situation {
 	REPL_URGENT_ON_CREATE = 1,
@@ -6057,7 +6058,18 @@ static int replmd_replicated_apply_next(struct replmd_replicated_request *ar)
 	struct GUID_txt_buf guid_str_buf;
 
 	if (ar->index_current >= ar->objs->num_objects) {
-		/* done with it, go to next stage */
+
+		/*
+		 * Now that we've applied all the objects, check that the new
+		 * linked attributes only reference objects we know about
+		 */
+		ret = replmd_verify_linked_attributes(ar);
+
+		if (ret != LDB_SUCCESS) {
+			return ret;
+		}
+
+		/* done applying objects, move on to the next stage */
 		return replmd_replicated_uptodate_vector(ar);
 	}
 
@@ -6812,35 +6824,29 @@ static int replmd_check_target_exists(struct ldb_module *module,
 	return ret;
 }
 
-/*
-  process one linked attribute structure
+/**
+ * Extracts the key details about the source/target object for a
+ * linked-attribute entry.
+ * This returns the following details:
+ * @param ret_attr the schema details for the linked attribute
+ * @param source_msg the search result for the source object
+ * @param target_dsdb_dn the unpacked DN info for the target object
  */
-static int replmd_process_linked_attribute(struct ldb_module *module,
-					   struct replmd_private *replmd_private,
+static int replmd_extract_la_entry_details(struct ldb_module *module,
 					   struct la_entry *la_entry,
-					   struct ldb_request *parent)
+					   TALLOC_CTX *mem_ctx,
+					   const struct dsdb_attribute **ret_attr,
+					   struct ldb_message **source_msg,
+					   struct dsdb_dn **target_dsdb_dn)
 {
 	struct drsuapi_DsReplicaLinkedAttribute *la = la_entry->la;
 	struct ldb_context *ldb = ldb_module_get_ctx(module);
-	struct ldb_message *msg;
-	TALLOC_CTX *tmp_ctx = talloc_new(la_entry);
-	const struct dsdb_schema *schema = dsdb_get_schema(ldb, tmp_ctx);
+	const struct dsdb_schema *schema = dsdb_get_schema(ldb, mem_ctx);
 	int ret;
 	const struct dsdb_attribute *attr;
-	struct dsdb_dn *dsdb_dn;
-	uint64_t seq_num = 0;
-	struct ldb_message_element *old_el;
 	WERROR status;
-	time_t t = time(NULL);
 	struct ldb_result *res;
 	const char *attrs[4];
-	struct parsed_dn *pdn_list, *pdn, *next;
-	struct GUID guid = GUID_zero();
-	bool active = (la->flags & DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE)?true:false;
-
-	enum deletion_state deletion_state = OBJECT_NOT_DELETED;
-	enum deletion_state target_deletion_state = OBJECT_NOT_DELETED;
-
 
 /*
 linked_attributes[0]:
@@ -6885,7 +6891,6 @@ linked_attributes[0]:
 				       la->attid,
 				       GUID_buf_string(&la->identifier->guid,
 						       &guid_str));
-		talloc_free(tmp_ctx);
 		return LDB_ERR_OPERATIONS_ERROR;
 	}
 
@@ -6894,28 +6899,130 @@ linked_attributes[0]:
 	attrs[2] = "isRecycled";
 	attrs[3] = NULL;
 
-	/* get the existing message from the db for the object with
-	   this GUID, returning attribute being modified. We will then
-	   use this msg as the basis for a modify call */
-	ret = dsdb_module_search(module, tmp_ctx, &res, NULL, LDB_SCOPE_SUBTREE, attrs,
+	/*
+	 * get the existing message from the db for the object with
+	 * this GUID, returning attribute being modified. We will then
+	 * use this msg as the basis for a modify call
+	 */
+	ret = dsdb_module_search(module, mem_ctx, &res, NULL, LDB_SCOPE_SUBTREE, attrs,
 	                         DSDB_FLAG_NEXT_MODULE |
 				 DSDB_SEARCH_SEARCH_ALL_PARTITIONS |
 				 DSDB_SEARCH_SHOW_RECYCLED |
 				 DSDB_SEARCH_SHOW_DN_IN_STORAGE_FORMAT |
 				 DSDB_SEARCH_REVEAL_INTERNALS,
-				 parent,
-				 "objectGUID=%s", GUID_string(tmp_ctx, &la->identifier->guid));
+				 NULL,
+				 "objectGUID=%s", GUID_string(mem_ctx, &la->identifier->guid));
 	if (ret != LDB_SUCCESS) {
-		talloc_free(tmp_ctx);
 		return ret;
 	}
 	if (res->count != 1) {
 		ldb_asprintf_errstring(ldb, "DRS linked attribute for GUID %s - DN not found",
-				       GUID_string(tmp_ctx, &la->identifier->guid));
-		talloc_free(tmp_ctx);
+				       GUID_string(mem_ctx, &la->identifier->guid));
 		return LDB_ERR_NO_SUCH_OBJECT;
 	}
-	msg = res->msgs[0];
+
+	*source_msg = res->msgs[0];
+
+	/* the value blob for the attribute holds the target object DN */
+	status = dsdb_dn_la_from_blob(ldb, attr, schema, mem_ctx, la->value.blob, target_dsdb_dn);
+	if (!W_ERROR_IS_OK(status)) {
+		ldb_asprintf_errstring(ldb, "Failed to parsed linked attribute blob for %s on %s - %s\n",
+				       attr->lDAPDisplayName,
+				       ldb_dn_get_linearized(res->msgs[0]->dn),
+				       win_errstr(status));
+		return LDB_ERR_OPERATIONS_ERROR;
+	}
+
+	*ret_attr = attr;
+
+	return LDB_SUCCESS;
+}
+
+/**
+ * Verifies that the source and target objects are known for each linked
+ * attribute in the current transaction.
+ */
+static int replmd_verify_linked_attributes(struct replmd_replicated_request *ar)
+{
+	int ret = LDB_SUCCESS;
+	struct la_entry *la;
+	struct ldb_module *module = ar->module;
+	TALLOC_CTX *tmp_ctx = talloc_new(ar);
+	struct replmd_private *replmd_private =
+		talloc_get_type(ldb_module_get_private(module), struct replmd_private);
+
+	for (la = DLIST_TAIL(replmd_private->la_list); la; la = DLIST_PREV(la)) {
+		struct ldb_message *src_msg;
+		const struct dsdb_attribute *attr;
+		struct dsdb_dn *tgt_dsdb_dn;
+		enum deletion_state del_state = OBJECT_NOT_DELETED;
+		struct GUID guid = GUID_zero();
+
+		ret = replmd_extract_la_entry_details(module, la, tmp_ctx, &attr,
+						      &src_msg, &tgt_dsdb_dn);
+
+		if (ret != LDB_SUCCESS) {
+			break;
+		}
+
+		ret = replmd_check_target_exists(module, tgt_dsdb_dn, la,
+						 src_msg->dn, &guid, &del_state);
+
+		/*
+		 * When we fail to find the target object, the error code we pass
+		 * back here is really important. It flags back to the callers to
+		 * retry this request with DRSUAPI_DRS_GET_TGT
+		 */
+		if (ret == LDB_ERR_NO_SUCH_OBJECT) {
+			ret = replmd_replicated_request_werror(ar, WERR_DS_DRA_RECYCLED_TARGET);
+		}
+
+		if (ret != LDB_SUCCESS) {
+			break;
+		}
+	}
+
+	talloc_free(tmp_ctx);
+	return ret;
+}
+
+/*
+  process one linked attribute structure
+ */
+static int replmd_process_linked_attribute(struct ldb_module *module,
+					   struct replmd_private *replmd_private,
+					   struct la_entry *la_entry,
+					   struct ldb_request *parent)
+{
+	struct drsuapi_DsReplicaLinkedAttribute *la = la_entry->la;
+	struct ldb_context *ldb = ldb_module_get_ctx(module);
+	struct ldb_message *msg;
+	TALLOC_CTX *tmp_ctx = talloc_new(la_entry);
+	const struct dsdb_schema *schema = dsdb_get_schema(ldb, tmp_ctx);
+	int ret;
+	const struct dsdb_attribute *attr;
+	struct dsdb_dn *dsdb_dn;
+	uint64_t seq_num = 0;
+	struct ldb_message_element *old_el;
+	time_t t = time(NULL);
+	struct parsed_dn *pdn_list, *pdn, *next;
+	struct GUID guid = GUID_zero();
+	bool active = (la->flags & DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE)?true:false;
+
+	enum deletion_state deletion_state = OBJECT_NOT_DELETED;
+	enum deletion_state target_deletion_state = OBJECT_NOT_DELETED;
+
+	/*
+	 * get the attribute being modified, the search result for the source object,
+	 * and the target object's DN details
+	 */
+	ret = replmd_extract_la_entry_details(module, la_entry, tmp_ctx, &attr,
+					      &msg, &dsdb_dn);
+
+	if (ret != LDB_SUCCESS) {
+		talloc_free(tmp_ctx);
+		return ret;
+	}
 
 	/*
 	 * Check for deleted objects per MS-DRSR 4.1.10.6.13
@@ -6924,7 +7031,6 @@ linked_attributes[0]:
 	 * any existing link, that should have happened when the
 	 * object deletion was replicated or initiated.
 	 */
-
 	replmd_deletion_state(module, msg, &deletion_state, NULL);
 
 	if (deletion_state >= OBJECT_RECYCLED) {
@@ -6953,14 +7059,6 @@ linked_attributes[0]:
 		return ret;
 	}
 
-	status = dsdb_dn_la_from_blob(ldb, attr, schema, tmp_ctx, la->value.blob, &dsdb_dn);
-	if (!W_ERROR_IS_OK(status)) {
-		ldb_asprintf_errstring(ldb, "Failed to parsed linked attribute blob for %s on %s - %s\n",
-				       old_el->name, ldb_dn_get_linearized(msg->dn), win_errstr(status));
-		talloc_free(tmp_ctx);
-		return LDB_ERR_OPERATIONS_ERROR;
-	}
-
 	ret = replmd_check_target_exists(module, dsdb_dn, la_entry, msg->dn,
 					 &guid, &target_deletion_state);
 
-- 
2.7.4


From 47a25c1ca938a0e1ab756567dd27efcbb02c2a7e Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Thu, 15 Jun 2017 16:52:33 +1200
Subject: [PATCH 05/36] drepl: Support GET_TGT on periodic replication client

- Update IDL comments to include Microsoft reference doc
- Add support for sending v10 GetNCChanges request (needed for the
  GET_TGT flag, which is in the new 'more_flags' field)
- Update to also set the GET_TGT flag in the same place we were setting
  GET_ANC (I split this logic out into a separate function).
- The state struct now needs to hold a 'more_flags' field as well (this
  flag is different to the GET_ANC replica flag)

Note that using the GET_TGT when replicating from a Windows DC could be
highly inefficient. Because Samba keeps the GET_TGT flag set throughout
the replication cycle, it will basically receive a repeated object from
Windows for every single linked attribute that it receives.

I believe Windows behaviour only expects the client to set the GET_TGT
flag when it actually needs to (i.e. when it receives a target object it
doesn't know about), rather than throughout the replication cycle.
However, this approach won't work with Samba-to-Samba replication,
because when the server receives the GET_TGT flag it restarts the
replication cycle from scratch. So if we only set the GET_TGT flag when
the client encountered an unknown target then Samba-to-Samba could
potentially get into an endless replication loop.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 librpc/idl/drsuapi.idl                |  4 +-
 source4/dsdb/repl/drepl_out_helpers.c | 97 +++++++++++++++++++++++++++++------
 source4/dsdb/repl/drepl_out_pull.c    |  1 +
 source4/dsdb/repl/drepl_service.h     |  2 +
 4 files changed, 86 insertions(+), 18 deletions(-)

diff --git a/librpc/idl/drsuapi.idl b/librpc/idl/drsuapi.idl
index 2eb1d65..51ef567 100644
--- a/librpc/idl/drsuapi.idl
+++ b/librpc/idl/drsuapi.idl
@@ -70,7 +70,9 @@ interface drsuapi
         } drsuapi_DrsUpdate;
 
 	/*****************/
-        /* Function 0x00 */
+        /* Function 0x00 drsuapi_DsBind() */
+
+        /* MS-DRSR 5.39 DRS_EXTENSIONS_INT */
         typedef [bitmap32bit] bitmap {
 		DRSUAPI_SUPPORTED_EXTENSION_BASE			= 0x00000001,
 		DRSUAPI_SUPPORTED_EXTENSION_ASYNC_REPLICATION		= 0x00000002,
diff --git a/source4/dsdb/repl/drepl_out_helpers.c b/source4/dsdb/repl/drepl_out_helpers.c
index ad4801a..6e24e7f 100644
--- a/source4/dsdb/repl/drepl_out_helpers.c
+++ b/source4/dsdb/repl/drepl_out_helpers.c
@@ -555,7 +555,28 @@ static void dreplsrv_op_pull_source_get_changes_trigger(struct tevent_req *req)
 	}
 
 	r->in.bind_handle	= &drsuapi->bind_handle;
-	if (drsuapi->remote_info28.supported_extensions & DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V8) {
+
+	if (drsuapi->remote_info28.supported_extensions & DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V10) {
+		r->in.level				= 10;
+		r->in.req->req10.destination_dsa_guid	= service->ntds_guid;
+		r->in.req->req10.source_dsa_invocation_id= rf1->source_dsa_invocation_id;
+		r->in.req->req10.naming_context		= &partition->nc;
+		r->in.req->req10.highwatermark		= highwatermark;
+		r->in.req->req10.uptodateness_vector	= uptodateness_vector;
+		r->in.req->req10.replica_flags		= replica_flags;
+		r->in.req->req10.max_object_count	= 133;
+		r->in.req->req10.max_ndr_size		= 1336811;
+		r->in.req->req10.extended_op		= state->op->extended_op;
+		r->in.req->req10.fsmo_info		= state->op->fsmo_info;
+		r->in.req->req10.partial_attribute_set	= pas;
+		r->in.req->req10.partial_attribute_set_ex= NULL;
+		r->in.req->req10.mapping_ctr.num_mappings= mappings == NULL ? 0 : mappings->num_mappings;
+		r->in.req->req10.mapping_ctr.mappings	= mappings == NULL ? NULL : mappings->mappings;
+
+		/* the only difference to v8 is the more_flags */
+		r->in.req->req10.more_flags = state->op->more_flags;
+
+	} else if (drsuapi->remote_info28.supported_extensions & DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V8) {
 		r->in.level				= 8;
 		r->in.req->req8.destination_dsa_guid	= service->ntds_guid;
 		r->in.req->req8.source_dsa_invocation_id= rf1->source_dsa_invocation_id;
@@ -693,6 +714,53 @@ static void dreplsrv_op_pull_source_get_changes_done(struct tevent_req *subreq)
 	dreplsrv_op_pull_source_apply_changes_trigger(req, r, ctr_level, ctr1, ctr6);
 }
 
+/**
+ * If processing a chunk of replication data fails, check if it is due to a
+ * problem that can be fixed by setting extra flags in the GetNCChanges request,
+ * i.e. GET_ANC or GET_TGT.
+ * @returns NT_STATUS_OK if the request was retried, and an error code if not
+ */
+static NTSTATUS dreplsrv_op_pull_retry_with_flags(struct tevent_req *req,
+						  WERROR error_code)
+{
+	struct dreplsrv_op_pull_source_state *state;
+	NTSTATUS nt_status = NT_STATUS_OK;
+
+	state = tevent_req_data(req, struct dreplsrv_op_pull_source_state);
+
+	/*
+	 * Check if we failed to apply the records due to a missing parent or
+	 * target object. If so, try again and ask for any mising parent/target
+	 * objects to be included this time.
+	 */
+	if (W_ERROR_EQUAL(error_code, WERR_DS_DRA_RECYCLED_TARGET)) {
+
+		if (state->op->more_flags & DRSUAPI_DRS_GET_TGT) {
+			DEBUG(1,("Missing target object despite setting DRSUAPI_DRS_GET_TGT flag\n"));
+			nt_status = NT_STATUS_INVALID_NETWORK_RESPONSE;
+		} else {
+			state->op->more_flags |= DRSUAPI_DRS_GET_TGT;
+			DEBUG(1,("Missing target object when we didn't set the DRSUAPI_DRS_GET_TGT flag, retrying\n"));
+			dreplsrv_op_pull_source_get_changes_trigger(req);
+		}
+	} else if (W_ERROR_EQUAL(error_code, WERR_DS_DRA_MISSING_PARENT)) {
+
+		if (state->op->options & DRSUAPI_DRS_GET_ANC) {
+			DEBUG(1,("Missing parent object despite setting DRSUAPI_DRS_GET_ANC flag\n"));
+			nt_status = NT_STATUS_INVALID_NETWORK_RESPONSE;
+		} else {
+			state->op->options |= DRSUAPI_DRS_GET_ANC;
+			DEBUG(4,("Missing parent object when we didn't set the DRSUAPI_DRS_GET_ANC flag, retrying\n"));
+			dreplsrv_op_pull_source_get_changes_trigger(req);
+		}
+	} else {
+		nt_status = werror_to_ntstatus(WERR_BAD_NET_RESP);
+	}
+
+	return nt_status;
+}
+
+
 static void dreplsrv_update_refs_trigger(struct tevent_req *req);
 
 static void dreplsrv_op_pull_source_apply_changes_trigger(struct tevent_req *req,
@@ -937,25 +1005,20 @@ static void dreplsrv_op_pull_source_apply_changes_trigger(struct tevent_req *req
 	if (!W_ERROR_IS_OK(status)) {
 
 		/*
-		 * If we failed to apply the records due to a missing
-		 * parent, try again after asking for the parent
-		 * records first.  Because we don't update the
-		 * highwatermark, we start this part of the cycle
-		 * again.
+		 * Check if this error can be fixed by resending the GetNCChanges
+		 * request with extra flags set (i.e. GET_ANC/GET_TGT)
 		 */
-		if (((state->op->options & DRSUAPI_DRS_GET_ANC) == 0)
-		    && W_ERROR_EQUAL(status, WERR_DS_DRA_MISSING_PARENT)) {
-			state->op->options |= DRSUAPI_DRS_GET_ANC;
-			DEBUG(4,("Missing parent object when we didn't set the DRSUAPI_DRS_GET_ANC flag, retrying\n"));
-			dreplsrv_op_pull_source_get_changes_trigger(req);
+		nt_status = dreplsrv_op_pull_retry_with_flags(req, status);
+
+		if (NT_STATUS_IS_OK(nt_status)) {
+
+			/*
+			 * We resent the request. Don't update the highwatermark,
+			 * we'll start this part of the cycle again.
+			 */
 			return;
-		} else if (((state->op->options & DRSUAPI_DRS_GET_ANC))
-			   && W_ERROR_EQUAL(status, WERR_DS_DRA_MISSING_PARENT)) {
-			DEBUG(1,("Missing parent object despite setting DRSUAPI_DRS_GET_ANC flag\n"));
-			nt_status = NT_STATUS_INVALID_NETWORK_RESPONSE;
-		} else {
-			nt_status = werror_to_ntstatus(WERR_BAD_NET_RESP);
 		}
+
 		DEBUG(0,("Failed to commit objects: %s/%s\n",
 			  win_errstr(status), nt_errstr(nt_status)));
 		tevent_req_nterror(req, nt_status);
diff --git a/source4/dsdb/repl/drepl_out_pull.c b/source4/dsdb/repl/drepl_out_pull.c
index 8b8ecd9..62a403c 100644
--- a/source4/dsdb/repl/drepl_out_pull.c
+++ b/source4/dsdb/repl/drepl_out_pull.c
@@ -126,6 +126,7 @@ WERROR dreplsrv_schedule_partition_pull_source(struct dreplsrv_service *s,
 	op->callback    = callback;
 	op->cb_data	= cb_data;
 	op->schedule_time = time(NULL);
+	op->more_flags	= 0;
 
 	DLIST_ADD_END(s->ops.pending, op);
 
diff --git a/source4/dsdb/repl/drepl_service.h b/source4/dsdb/repl/drepl_service.h
index 4b0e43f..8c44e19 100644
--- a/source4/dsdb/repl/drepl_service.h
+++ b/source4/dsdb/repl/drepl_service.h
@@ -130,6 +130,8 @@ struct dreplsrv_out_operation {
 	enum drsuapi_DsExtendedError extended_ret;
 	dreplsrv_extended_callback_t callback;
 	void *cb_data;
+	/* more replication flags - used by DsReplicaSync GET_TGT */
+	uint32_t more_flags;
 };
 
 struct dreplsrv_notify_operation {
-- 
2.7.4


From c1c5f5d608ff85809b3ff58294ec171460e8b0d1 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Fri, 16 Jun 2017 09:49:16 +1200
Subject: [PATCH 06/36] replmd: Set GET_ANC if Windows sends a link with
 unknown source object

Windows replication can send the linked attribute before it sends the
source object. The MS-DRSR spec says that in this case the client should
resend the GetNCChanges request with the GET_ANC flag set. In my testing
this resolves the problem - Windows will include the source object for the
linked attribute in the same replication chunk.

This problem doesn't happen with Samba-to-Samba replication, because the
source object for the linked attribute is guaranteed to have already been
sent.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index c04358f..fde9b72 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -6961,6 +6961,17 @@ static int replmd_verify_linked_attributes(struct replmd_replicated_request *ar)
 		ret = replmd_extract_la_entry_details(module, la, tmp_ctx, &attr,
 						      &src_msg, &tgt_dsdb_dn);
 
+		/*
+		 * When we fail to find the source object, the error code we pass
+		 * back here is really important. It flags back to the callers to
+		 * retry this request with DRSUAPI_DRS_GET_ANC. This case should
+		 * never happen if we're replicating from a Samba DC, but it is
+		 * needed to talk to a Windows DC
+		 */
+		if (ret == LDB_ERR_NO_SUCH_OBJECT) {
+			ret = replmd_replicated_request_werror(ar, WERR_DS_DRA_MISSING_PARENT);
+		}
+
 		if (ret != LDB_SUCCESS) {
 			break;
 		}
-- 
2.7.4


From 3d6de943e27c75e307386499956f1983bce817a4 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Fri, 16 Jun 2017 12:54:32 +1200
Subject: [PATCH 07/36] drs_utils: Add GET_TGT support to 'samba-tool drs
 replicate --local'

Update drs_Replicate.replicate() so it handles being passed the GET_TGT
flag (more_flags). To do this, we need to always use a v10 GetNCChanges
request (v8 and v10 are essentially the same except for the more_flags).

If the replicate_chunk() call into the C bindings throws an error, check
to see whether the error could be fixed by setting the GET_TGT flag, and
re-send the request if so.

Unfortunately because WERR_DS_DRA_RECYCLED_TARGET isn't documented with
the other AD error codes, I've left it hardcoded for now (Microsoft
should be fixing up their Docs).

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 python/samba/drs_utils.py | 101 ++++++++++++++++++++++++++++++----------------
 1 file changed, 66 insertions(+), 35 deletions(-)

diff --git a/python/samba/drs_utils.py b/python/samba/drs_utils.py
index b9ed059..bed8817 100644
--- a/python/samba/drs_utils.py
+++ b/python/samba/drs_utils.py
@@ -21,6 +21,8 @@ from samba.dcerpc import drsuapi, misc, drsblobs
 from samba.net import Net
 from samba.ndr import ndr_unpack
 from samba import dsdb
+from samba import werror
+from samba import WERRORError
 import samba, ldb
 
 
@@ -198,18 +200,37 @@ class drs_Replicate(object):
             raise RuntimeError("Must not set GUID 00000000-0000-0000-0000-000000000000 as invocation_id")
         self.replication_state = self.net.replicate_init(self.samdb, lp, self.drs, invocation_id)
 
+    def _should_retry_with_get_tgt(self, error_code, req):
+
+        # If the error indicates we fail to resolve a target object for a
+        # linked attribute, then we should retry the request with GET_TGT
+        # (if we support it and haven't already tried that)
+
+        # TODO fix up the below line when we next update werror_err_table.txt
+        # and pull in the new error-code
+        # return (error_code == werror.WERR_DS_DRA_RECYCLED_TARGET and
+        return (error_code == 0x21bf and
+                (req.more_flags & drsuapi.DRSUAPI_DRS_GET_TGT) == 0 and
+                self.supported_extensions & drsuapi.DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V10)
+
     def replicate(self, dn, source_dsa_invocation_id, destination_dsa_guid,
                   schema=False, exop=drsuapi.DRSUAPI_EXOP_NONE, rodc=False,
-                  replica_flags=None, full_sync=True, sync_forced=False):
+                  replica_flags=None, full_sync=True, sync_forced=False, more_flags=0):
         '''replicate a single DN'''
 
         # setup for a GetNCChanges call
-        req8 = drsuapi.DsGetNCChangesRequest8()
+        if self.supported_extensions & drsuapi.DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V10:
+            req = drsuapi.DsGetNCChangesRequest10()
+            req.more_flags = more_flags
+            req_level = 10
+        else:
+            req_level = 8
+            req = drsuapi.DsGetNCChangesRequest8()
 
-        req8.destination_dsa_guid = destination_dsa_guid
-        req8.source_dsa_invocation_id = source_dsa_invocation_id
-        req8.naming_context = drsuapi.DsReplicaObjectIdentifier()
-        req8.naming_context.dn = dn
+        req.destination_dsa_guid = destination_dsa_guid
+        req.source_dsa_invocation_id = source_dsa_invocation_id
+        req.naming_context = drsuapi.DsReplicaObjectIdentifier()
+        req.naming_context.dn = dn
 
         # Default to a full replication if we don't find an upToDatenessVector
         udv = None
@@ -244,49 +265,46 @@ class drs_Replicate(object):
             udv.cursors = cursors_v1
             udv.count = len(cursors_v1)
 
-        req8.highwatermark = hwm
-        req8.uptodateness_vector = udv
+        req.highwatermark = hwm
+        req.uptodateness_vector = udv
 
         if replica_flags is not None:
-            req8.replica_flags = replica_flags
+            req.replica_flags = replica_flags
         elif exop == drsuapi.DRSUAPI_EXOP_REPL_SECRET:
-            req8.replica_flags = 0
+            req.replica_flags = 0
         else:
-            req8.replica_flags = (drsuapi.DRSUAPI_DRS_INIT_SYNC |
-                                  drsuapi.DRSUAPI_DRS_PER_SYNC |
-                                  drsuapi.DRSUAPI_DRS_GET_ANC |
-                                  drsuapi.DRSUAPI_DRS_NEVER_SYNCED |
-                                  drsuapi.DRSUAPI_DRS_GET_ALL_GROUP_MEMBERSHIP)
+            req.replica_flags = (drsuapi.DRSUAPI_DRS_INIT_SYNC |
+                                 drsuapi.DRSUAPI_DRS_PER_SYNC |
+                                 drsuapi.DRSUAPI_DRS_GET_ANC |
+                                 drsuapi.DRSUAPI_DRS_NEVER_SYNCED |
+                                 drsuapi.DRSUAPI_DRS_GET_ALL_GROUP_MEMBERSHIP)
             if rodc:
-                req8.replica_flags |= (
-                    drsuapi.DRSUAPI_DRS_SPECIAL_SECRET_PROCESSING)
+                req.replica_flags |= (
+                     drsuapi.DRSUAPI_DRS_SPECIAL_SECRET_PROCESSING)
             else:
-                req8.replica_flags |= drsuapi.DRSUAPI_DRS_WRIT_REP
+                req.replica_flags |= drsuapi.DRSUAPI_DRS_WRIT_REP
 
         if sync_forced:
-            req8.replica_flags |= drsuapi.DRSUAPI_DRS_SYNC_FORCED
+            req.replica_flags |= drsuapi.DRSUAPI_DRS_SYNC_FORCED
 
-        req8.max_object_count = 402
-        req8.max_ndr_size = 402116
-        req8.extended_op = exop
-        req8.fsmo_info = 0
-        req8.partial_attribute_set = None
-        req8.partial_attribute_set_ex = None
-        req8.mapping_ctr.num_mappings = 0
-        req8.mapping_ctr.mappings = None
+        req.max_object_count = 402
+        req.max_ndr_size = 402116
+        req.extended_op = exop
+        req.fsmo_info = 0
+        req.partial_attribute_set = None
+        req.partial_attribute_set_ex = None
+        req.mapping_ctr.num_mappings = 0
+        req.mapping_ctr.mappings = None
 
         if not schema and rodc:
-            req8.partial_attribute_set = drs_get_rodc_partial_attribute_set(self.samdb)
+            req.partial_attribute_set = drs_get_rodc_partial_attribute_set(self.samdb)
 
-        if self.supported_extensions & drsuapi.DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V8:
-            req_level = 8
-            req = req8
-        else:
+        if not self.supported_extensions & drsuapi.DRSUAPI_SUPPORTED_EXTENSION_GETCHGREQ_V8:
             req_level = 5
             req5 = drsuapi.DsGetNCChangesRequest5()
             for a in dir(req5):
                 if a[0] != '_':
-                    setattr(req5, a, getattr(req8, a))
+                    setattr(req5, a, getattr(req, a))
             req = req5
 
         num_objects = 0
@@ -295,8 +313,21 @@ class drs_Replicate(object):
             (level, ctr) = self.drs.DsGetNCChanges(self.drs_handle, req_level, req)
             if ctr.first_object is None and ctr.object_count != 0:
                 raise RuntimeError("DsGetNCChanges: NULL first_object with object_count=%u" % (ctr.object_count))
-            self.net.replicate_chunk(self.replication_state, level, ctr,
-                schema=schema, req_level=req_level, req=req)
+
+            try:
+                self.net.replicate_chunk(self.replication_state, level, ctr,
+                    schema=schema, req_level=req_level, req=req)
+            except WERRORError as e:
+                # Check if retrying with the GET_TGT flag set might resolve this error
+                if self._should_retry_with_get_tgt(e[0], req):
+
+                    print("Missing target object - retrying with DRS_GET_TGT")
+                    req.more_flags |= drsuapi.DRSUAPI_DRS_GET_TGT
+
+                    # try sending the request again
+                    continue
+                else:
+                    raise e
 
             num_objects += ctr.object_count
 
-- 
2.7.4


From db015b1630463c750f9a34e654c037a3f6181d08 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Thu, 22 Jun 2017 10:43:31 +1200
Subject: [PATCH 08/36] replmd: Move where we store linked attributes

There was a bug in my previous patch where the code would verify
*all* links in the list, rather than just the ones that are new. And it
would do this for every replication chunk it received, regardless of
whether there were actually any links in that chunk.

The problem is by the time we want to verify the attributes, we don't
actually know which attributes are new. We can fix this by moving where
we store the linked attributes from the start of processing the
replication chunk to the end of processing the chunk. We can then verify
the new linked attributes at the same time we store them.

Longer-term we may want to try to apply the linked attribute at this
point. This would save looking up the source/target objects twice, but
it makes things a bit more complicated (attributes will usually apply at
this point *most* of the time, but not *all* the time).

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 193 +++++++++++++-----------
 1 file changed, 103 insertions(+), 90 deletions(-)

diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index fde9b72..0a53d1d 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -114,7 +114,8 @@ static int replmd_check_upgrade_links(struct ldb_context *ldb,
 				      struct parsed_dn *dns, uint32_t count,
 				      struct ldb_message_element *el,
 				      const char *ldap_oid);
-static int replmd_verify_linked_attributes(struct replmd_replicated_request *ar);
+static int replmd_verify_linked_attribute(struct replmd_replicated_request *ar,
+					  struct la_entry *la);
 
 enum urgent_situation {
 	REPL_URGENT_ON_CREATE = 1,
@@ -6042,6 +6043,70 @@ static int replmd_replicated_apply_search_callback(struct ldb_request *req,
 	return LDB_SUCCESS;
 }
 
+/**
+ * Stores the linked attributes received in the replication chunk - these get
+ * applied at the end of the transaction. We also check that each linked
+ * attribute is valid, i.e. source and target objects are known.
+ */
+static int replmd_store_linked_attributes(struct replmd_replicated_request *ar)
+{
+	int ret = LDB_SUCCESS;
+	uint32_t i;
+	bool incomplete_subset;
+	struct ldb_module *module = ar->module;
+	struct replmd_private *replmd_private =
+		talloc_get_type(ldb_module_get_private(module), struct replmd_private);
+	struct ldb_context *ldb;
+
+	ldb = ldb_module_get_ctx(module);
+	incomplete_subset = (ar->objs->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET);
+
+	DEBUG(4,("linked_attributes_count=%u\n", ar->objs->linked_attributes_count));
+
+	/* save away the linked attributes for the end of the transaction */
+	for (i = 0; i < ar->objs->linked_attributes_count; i++) {
+		struct la_entry *la_entry;
+
+		if (replmd_private->la_ctx == NULL) {
+			replmd_private->la_ctx = talloc_new(replmd_private);
+		}
+		la_entry = talloc(replmd_private->la_ctx, struct la_entry);
+		if (la_entry == NULL) {
+			ldb_oom(ldb);
+			return LDB_ERR_OPERATIONS_ERROR;
+		}
+		la_entry->la = talloc(la_entry, struct drsuapi_DsReplicaLinkedAttribute);
+		if (la_entry->la == NULL) {
+			talloc_free(la_entry);
+			ldb_oom(ldb);
+			return LDB_ERR_OPERATIONS_ERROR;
+		}
+		*la_entry->la = ar->objs->linked_attributes[i];
+
+		/*
+		 * we may not be able to resolve link targets properly when
+		 * dealing with subsets of objects, e.g. the source is a
+		 * critical object and the target isn't
+		 */
+		la_entry->incomplete_replica = incomplete_subset;
+
+		/* we need to steal the non-scalars so they stay
+		   around until the end of the transaction */
+		talloc_steal(la_entry->la, la_entry->la->identifier);
+		talloc_steal(la_entry->la, la_entry->la->value.blob);
+
+		ret = replmd_verify_linked_attribute(ar, la_entry);
+
+		if (ret != LDB_SUCCESS) {
+			break;
+		}
+
+		DLIST_ADD(replmd_private->la_list, la_entry);
+	}
+
+	return ret;
+}
+
 static int replmd_replicated_uptodate_vector(struct replmd_replicated_request *ar);
 
 static int replmd_replicated_apply_next(struct replmd_replicated_request *ar)
@@ -6060,10 +6125,10 @@ static int replmd_replicated_apply_next(struct replmd_replicated_request *ar)
 	if (ar->index_current >= ar->objs->num_objects) {
 
 		/*
-		 * Now that we've applied all the objects, check that the new
-		 * linked attributes only reference objects we know about
+		 * Now that we've applied all the objects, check the new linked
+		 * attributes and store them (we apply them in .prepare_commit)
 		 */
-		ret = replmd_verify_linked_attributes(ar);
+		ret = replmd_store_linked_attributes(ar);
 
 		if (ret != LDB_SUCCESS) {
 			return ret;
@@ -6547,10 +6612,6 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 	struct replmd_replicated_request *ar;
 	struct ldb_control **ctrls;
 	int ret;
-	uint32_t i;
-	bool incomplete_subset;
-	struct replmd_private *replmd_private =
-		talloc_get_type(ldb_module_get_private(module), struct replmd_private);
 
 	ldb = ldb_module_get_ctx(module);
 
@@ -6606,45 +6667,6 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 
 	ar->controls = req->controls;
 	req->controls = ctrls;
-	incomplete_subset = (ar->objs->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET);
-
-	DEBUG(4,("linked_attributes_count=%u\n", objs->linked_attributes_count));
-
-	/* save away the linked attributes for the end of the
-	   transaction */
-	for (i=0; i<ar->objs->linked_attributes_count; i++) {
-		struct la_entry *la_entry;
-
-		if (replmd_private->la_ctx == NULL) {
-			replmd_private->la_ctx = talloc_new(replmd_private);
-		}
-		la_entry = talloc(replmd_private->la_ctx, struct la_entry);
-		if (la_entry == NULL) {
-			ldb_oom(ldb);
-			return LDB_ERR_OPERATIONS_ERROR;
-		}
-		la_entry->la = talloc(la_entry, struct drsuapi_DsReplicaLinkedAttribute);
-		if (la_entry->la == NULL) {
-			talloc_free(la_entry);
-			ldb_oom(ldb);
-			return LDB_ERR_OPERATIONS_ERROR;
-		}
-		*la_entry->la = ar->objs->linked_attributes[i];
-
-		/*
-		 * we may not be able to resolve link targets properly when
-		 * dealing with subsets of objects, e.g. the source is a
-		 * critical object and the target isn't
-		 */
-		la_entry->incomplete_replica = incomplete_subset;
-
-		/* we need to steal the non-scalars so they stay
-		   around until the end of the transaction */
-		talloc_steal(la_entry->la, la_entry->la->identifier);
-		talloc_steal(la_entry->la, la_entry->la->value.blob);
-
-		DLIST_ADD(replmd_private->la_list, la_entry);
-	}
 
 	return replmd_replicated_apply_next(ar);
 }
@@ -6939,58 +6961,49 @@ linked_attributes[0]:
 }
 
 /**
- * Verifies that the source and target objects are known for each linked
- * attribute in the current transaction.
+ * Verifies the source and target objects are known for a linked attribute
  */
-static int replmd_verify_linked_attributes(struct replmd_replicated_request *ar)
+static int replmd_verify_linked_attribute(struct replmd_replicated_request *ar,
+					  struct la_entry *la)
 {
 	int ret = LDB_SUCCESS;
-	struct la_entry *la;
+	TALLOC_CTX *tmp_ctx = talloc_new(la);
 	struct ldb_module *module = ar->module;
-	TALLOC_CTX *tmp_ctx = talloc_new(ar);
-	struct replmd_private *replmd_private =
-		talloc_get_type(ldb_module_get_private(module), struct replmd_private);
-
-	for (la = DLIST_TAIL(replmd_private->la_list); la; la = DLIST_PREV(la)) {
-		struct ldb_message *src_msg;
-		const struct dsdb_attribute *attr;
-		struct dsdb_dn *tgt_dsdb_dn;
-		enum deletion_state del_state = OBJECT_NOT_DELETED;
-		struct GUID guid = GUID_zero();
-
-		ret = replmd_extract_la_entry_details(module, la, tmp_ctx, &attr,
-						      &src_msg, &tgt_dsdb_dn);
+	struct ldb_message *src_msg;
+	const struct dsdb_attribute *attr;
+	struct dsdb_dn *tgt_dsdb_dn;
+	enum deletion_state del_state = OBJECT_NOT_DELETED;
+	struct GUID guid = GUID_zero();
 
-		/*
-		 * When we fail to find the source object, the error code we pass
-		 * back here is really important. It flags back to the callers to
-		 * retry this request with DRSUAPI_DRS_GET_ANC. This case should
-		 * never happen if we're replicating from a Samba DC, but it is
-		 * needed to talk to a Windows DC
-		 */
-		if (ret == LDB_ERR_NO_SUCH_OBJECT) {
-			ret = replmd_replicated_request_werror(ar, WERR_DS_DRA_MISSING_PARENT);
-		}
+	ret = replmd_extract_la_entry_details(module, la, tmp_ctx, &attr,
+					      &src_msg, &tgt_dsdb_dn);
 
-		if (ret != LDB_SUCCESS) {
-			break;
-		}
+	/*
+	 * When we fail to find the source object, the error code we pass
+	 * back here is really important. It flags back to the callers to
+	 * retry this request with DRSUAPI_DRS_GET_ANC. This case should
+	 * never happen if we're replicating from a Samba DC, but it is
+	 * needed to talk to a Windows DC
+	 */
+	if (ret == LDB_ERR_NO_SUCH_OBJECT) {
+		ret = replmd_replicated_request_werror(ar, WERR_DS_DRA_MISSING_PARENT);
+	}
 
-		ret = replmd_check_target_exists(module, tgt_dsdb_dn, la,
-						 src_msg->dn, &guid, &del_state);
+	if (ret != LDB_SUCCESS) {
+		talloc_free(tmp_ctx);
+		return ret;
+	}
 
-		/*
-		 * When we fail to find the target object, the error code we pass
-		 * back here is really important. It flags back to the callers to
-		 * retry this request with DRSUAPI_DRS_GET_TGT
-		 */
-		if (ret == LDB_ERR_NO_SUCH_OBJECT) {
-			ret = replmd_replicated_request_werror(ar, WERR_DS_DRA_RECYCLED_TARGET);
-		}
+	ret = replmd_check_target_exists(module, tgt_dsdb_dn, la,
+					 src_msg->dn, &guid, &del_state);
 
-		if (ret != LDB_SUCCESS) {
-			break;
-		}
+	/*
+	 * When we fail to find the target object, the error code we pass
+	 * back here is really important. It flags back to the callers to
+	 * retry this request with DRSUAPI_DRS_GET_TGT
+	 */
+	if (ret == LDB_ERR_NO_SUCH_OBJECT) {
+		ret = replmd_replicated_request_werror(ar, WERR_DS_DRA_RECYCLED_TARGET);
 	}
 
 	talloc_free(tmp_ctx);
-- 
2.7.4


From 4bb22a56755bc4ce3b1733ad7dcbc581d7f06771 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Wed, 19 Jul 2017 16:18:20 +1200
Subject: [PATCH 09/36] replmd: Avoid dropping links if link target is deleted

The server-side can potentially send the linked attribute before the
target-object. This happens on Microsoft, and will happen on Samba once
server-side GET_TGT support is added. In these cases there is a hole
where the Samba client can silently drop the linked attribute.

If the old copy of the target object was deleted/recycled, then the
client can receive the new linked attribute before it realizes the target
has now been reincarnated. It silently ignores the linked attribute,
thinking its receiving out of date information, when really it's the
client's copy of the target object that's out of date.

In this case we want to retry with the GET_TGT flag set, which will
force the updated version of the target object to be sent along with the
linked attribute. This deleted/recycled target case is the main reason
that Windows added the GET_TGT flag.

If the server sends all the links at the end, instead of along with the
source object, then this case can still be hit. If so, it will cause the
server to restart the replication from the beginning again. This is
probably preferential to silently dropping links.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 52 ++++++++++++++++---------
 1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index 0a53d1d..8e2c64b 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -6713,14 +6713,16 @@ static bool replmd_objects_have_same_nc(struct ldb_context *ldb,
 
 /**
  * Checks that the target object for a linked attribute exists.
- * If it exists, its GUID and deletion-state are returned
+ * @param guid returns the target object's GUID (is returned)if it exists)
+ * @param ignore_link set to true if the linked attribute should be ignored
+ * (i.e. the target doesn't exist, but that it's OK to skip the link)
  */
 static int replmd_check_target_exists(struct ldb_module *module,
 				      struct dsdb_dn *dsdb_dn,
 				      struct la_entry *la_entry,
 				      struct ldb_dn *source_dn,
 				      struct GUID *guid,
-				      enum deletion_state *target_deletion_state)
+				      bool *ignore_link)
 {
 	struct drsuapi_DsReplicaLinkedAttribute *la = la_entry->la;
 	struct ldb_context *ldb = ldb_module_get_ctx(module);
@@ -6729,10 +6731,10 @@ static int replmd_check_target_exists(struct ldb_module *module,
 	const char *attrs[] = { "isDeleted", "isRecycled", NULL };
 	NTSTATUS ntstatus;
 	int ret;
+	enum deletion_state target_deletion_state = OBJECT_REMOVED;
 	bool active = (la->flags & DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE) ? true : false;
 
-	*target_deletion_state = OBJECT_REMOVED;
-
+	*ignore_link = false;
 	ntstatus = dsdb_get_extended_dn_guid(dsdb_dn->dn, guid, "GUID");
 
 	if (!NT_STATUS_IS_OK(ntstatus) && !active) {
@@ -6808,6 +6810,7 @@ static int replmd_check_target_exists(struct ldb_module *module,
 				 ": Failed to find target %s linked from %s\n",
 				 ldb_dn_get_linearized(dsdb_dn->dn),
 				 ldb_dn_get_linearized(source_dn)));
+			*ignore_link = true;
 
 		} else if (replmd_objects_have_same_nc(ldb, tmp_ctx, source_dn,
 						       dsdb_dn->dn)) {
@@ -6827,6 +6830,7 @@ static int replmd_check_target_exists(struct ldb_module *module,
 			DEBUG(2,("Failed to resolve cross-partition link between %s and %s\n",
 				 ldb_dn_get_linearized(source_dn),
 				 ldb_dn_get_linearized(dsdb_dn->dn)));
+			*ignore_link = true;
 		}
 	} else if (target_res->count != 1) {
 		ldb_asprintf_errstring(ldb, "More than one object found matching objectGUID %s\n",
@@ -6839,7 +6843,23 @@ static int replmd_check_target_exists(struct ldb_module *module,
 
 		/* Get the object's state (i.e. Not Deleted, Tombstone, etc) */
 		replmd_deletion_state(module, target_msg,
-				      target_deletion_state, NULL);
+				      &target_deletion_state, NULL);
+
+		/*
+		 * Check for deleted objects as per MS-DRSR 4.1.10.6.14
+		 * ProcessLinkValue(). Link updates should not be sent for
+		 * recycled and tombstone objects (deleting the links should
+		 * happen when we delete the object). This probably means our
+		 * copy of the target object isn't up to date.
+		 */
+		if (target_deletion_state >= OBJECT_RECYCLED) {
+			ldb_asprintf_errstring(ldb,
+					       "Deleted target %s GUID %s linked from %s\n",
+					       ldb_dn_get_linearized(dsdb_dn->dn),
+					       GUID_string(tmp_ctx, guid),
+					       ldb_dn_get_linearized(source_dn));
+			ret = LDB_ERR_NO_SUCH_OBJECT;
+		}
 	}
 
 	talloc_free(tmp_ctx);
@@ -6972,8 +6992,8 @@ static int replmd_verify_linked_attribute(struct replmd_replicated_request *ar,
 	struct ldb_message *src_msg;
 	const struct dsdb_attribute *attr;
 	struct dsdb_dn *tgt_dsdb_dn;
-	enum deletion_state del_state = OBJECT_NOT_DELETED;
 	struct GUID guid = GUID_zero();
+	bool dummy;
 
 	ret = replmd_extract_la_entry_details(module, la, tmp_ctx, &attr,
 					      &src_msg, &tgt_dsdb_dn);
@@ -6995,7 +7015,7 @@ static int replmd_verify_linked_attribute(struct replmd_replicated_request *ar,
 	}
 
 	ret = replmd_check_target_exists(module, tgt_dsdb_dn, la,
-					 src_msg->dn, &guid, &del_state);
+					 src_msg->dn, &guid, &dummy);
 
 	/*
 	 * When we fail to find the target object, the error code we pass
@@ -7032,9 +7052,8 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	struct parsed_dn *pdn_list, *pdn, *next;
 	struct GUID guid = GUID_zero();
 	bool active = (la->flags & DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE)?true:false;
-
+	bool ignore_link;
 	enum deletion_state deletion_state = OBJECT_NOT_DELETED;
-	enum deletion_state target_deletion_state = OBJECT_NOT_DELETED;
 
 	/*
 	 * get the attribute being modified, the search result for the source object,
@@ -7049,7 +7068,7 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	}
 
 	/*
-	 * Check for deleted objects per MS-DRSR 4.1.10.6.13
+	 * Check for deleted objects per MS-DRSR 4.1.10.6.14
 	 * ProcessLinkValue, because link updates are not applied to
 	 * recycled and tombstone objects.  We don't have to delete
 	 * any existing link, that should have happened when the
@@ -7084,7 +7103,7 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	}
 
 	ret = replmd_check_target_exists(module, dsdb_dn, la_entry, msg->dn,
-					 &guid, &target_deletion_state);
+					 &guid, &ignore_link);
 
 	if (ret != LDB_SUCCESS) {
 		talloc_free(tmp_ctx);
@@ -7092,15 +7111,12 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	}
 
 	/*
-	 * Check for deleted objects per MS-DRSR 4.1.10.6.13
-	 * ProcessLinkValue, because link updates are not applied to
-	 * recycled and tombstone objects.  We don't have to delete
-	 * any existing link, that should have happened when the
-	 * object deletion was replicated or initiated.
+	 * there are some cases where the target object doesn't exist, but it's
+	 * OK to ignore the linked attribute
 	 */
-	if (target_deletion_state >= OBJECT_RECYCLED) {
+	if (ignore_link) {
 		talloc_free(tmp_ctx);
-		return LDB_SUCCESS;
+		return ret;
 	}
 
 	/* see if this link already exists */
-- 
2.7.4


From 7542c8dc81040943d927b2e078e83bf0928ec860 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Thu, 20 Jul 2017 11:14:27 +1200
Subject: [PATCH 10/36] replmd: Don't fail cycle if we get link for deleted
 object with GET_TGT

We are going to end up supporting 2 different server schemes:
A. the old/default behaviour of sending all the linked attributes last,
   at the end of the replication cycle.
B. the new/Microsoft way of sending the linked attributes interleaved
   with the source/target objects.

Normally if we're talking to a server using the old scheme-A, we won't
ever use the GET_TGT flag. However, there are a couple of cases where
it can happen:
- A link to a new object was added during the replication cycle.
- An object was deleted while the replication was in progress (and
the linked attribute got queued before the object was deleted).

Talking to an Samba DC running the old scheme will just cause it to
start the replication cycle from scratch again, which is fairly
harmless. However, there is a chance that the same thing can happen
again, in which case the replication cycle will fail (because GET_TGT
was already set).

Even if we're using the new scheme (B), we could still potentially hit
this case, as we can still queue up linked attributes between requests
(group memberships can be larger than what can fit into a single
replication chunk).

If GET_TGT is set in the GetNcChanges request, then the local copy of
the target object should always be up-to-date when we process the linked
attribute. So if we still think the target object is deleted/recycled at
this point, then it's safe to ignore the linked attribute (because we
know our local copy is up-to-date). This logic matches the MS spec logic
in ProcessLinkValue().

Not failing the replication cycle may be beneficial if we're trying to
do a full-sync of a large database. Otherwise it might be time-consuming
and frustrating to repeat the sync unnecessarily.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/repl/drepl_out_helpers.c           |  4 ++
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 51 +++++++++++++++----------
 source4/dsdb/samdb/samdb.h                      |  1 +
 source4/libnet/libnet_vampire.c                 |  4 ++
 4 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/source4/dsdb/repl/drepl_out_helpers.c b/source4/dsdb/repl/drepl_out_helpers.c
index 6e24e7f..6630804 100644
--- a/source4/dsdb/repl/drepl_out_helpers.c
+++ b/source4/dsdb/repl/drepl_out_helpers.c
@@ -868,6 +868,10 @@ static void dreplsrv_op_pull_source_apply_changes_trigger(struct tevent_req *req
 		dsdb_repl_flags |= DSDB_REPL_FLAG_OBJECT_SUBSET;
 	}
 
+	if (state->op->more_flags & DRSUAPI_DRS_GET_TGT) {
+		dsdb_repl_flags |= DSDB_REPL_FLAG_TARGETS_UPTODATE;
+	}
+
 	if (state->op->extended_op != DRSUAPI_EXOP_NONE) {
 		ret = dsdb_find_nc_root(service->samdb, partition,
 					partition->dn, &nc_root);
diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index 8e2c64b..228622c 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -74,7 +74,7 @@ struct replmd_private {
 struct la_entry {
 	struct la_entry *next, *prev;
 	struct drsuapi_DsReplicaLinkedAttribute *la;
-	bool incomplete_replica;
+	uint32_t dsdb_repl_flags;
 };
 
 struct replmd_replicated_request {
@@ -6052,14 +6052,12 @@ static int replmd_store_linked_attributes(struct replmd_replicated_request *ar)
 {
 	int ret = LDB_SUCCESS;
 	uint32_t i;
-	bool incomplete_subset;
 	struct ldb_module *module = ar->module;
 	struct replmd_private *replmd_private =
 		talloc_get_type(ldb_module_get_private(module), struct replmd_private);
 	struct ldb_context *ldb;
 
 	ldb = ldb_module_get_ctx(module);
-	incomplete_subset = (ar->objs->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET);
 
 	DEBUG(4,("linked_attributes_count=%u\n", ar->objs->linked_attributes_count));
 
@@ -6082,13 +6080,7 @@ static int replmd_store_linked_attributes(struct replmd_replicated_request *ar)
 			return LDB_ERR_OPERATIONS_ERROR;
 		}
 		*la_entry->la = ar->objs->linked_attributes[i];
-
-		/*
-		 * we may not be able to resolve link targets properly when
-		 * dealing with subsets of objects, e.g. the source is a
-		 * critical object and the target isn't
-		 */
-		la_entry->incomplete_replica = incomplete_subset;
+		la_entry->dsdb_repl_flags = ar->objs->dsdb_repl_flags;
 
 		/* we need to steal the non-scalars so they stay
 		   around until the end of the transaction */
@@ -6793,17 +6785,19 @@ static int replmd_check_target_exists(struct ldb_module *module,
 	if (target_res->count == 0) {
 
 		/*
+		 * we may not be able to resolve link targets properly when
+		 * dealing with subsets of objects, e.g. the source is a
+		 * critical object and the target isn't
+		 *
 		 * TODO:
 		 * When we implement Trusted Domains we need to consider
 		 * whether they get treated as an incomplete replica here or not
 		 */
-		if (la_entry->incomplete_replica) {
+		if (la_entry->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET) {
 
 			/*
-			 * If we're only replicating a subset of objects (e.g.
-			 * critical-only, single-object), then an unknown target
-			 * is probably not a critical problem. We don't increase
-			 * the highwater-mark so subsequent replications should
+			 * We don't increase the highwater-mark in the object
+			 * subset cases, so subsequent replications should
 			 * resolve any missing links
 			 */
 			DEBUG(2,(__location__
@@ -6853,12 +6847,27 @@ static int replmd_check_target_exists(struct ldb_module *module,
 		 * copy of the target object isn't up to date.
 		 */
 		if (target_deletion_state >= OBJECT_RECYCLED) {
-			ldb_asprintf_errstring(ldb,
-					       "Deleted target %s GUID %s linked from %s\n",
-					       ldb_dn_get_linearized(dsdb_dn->dn),
-					       GUID_string(tmp_ctx, guid),
-					       ldb_dn_get_linearized(source_dn));
-			ret = LDB_ERR_NO_SUCH_OBJECT;
+
+			if (la_entry->dsdb_repl_flags & DSDB_REPL_FLAG_TARGETS_UPTODATE) {
+
+				/*
+				 * target should already be uptodate so there's no
+				 * point retrying - it's probably just bad timing
+				 */
+				*ignore_link = true;
+				DEBUG(0, ("%s is deleted but up to date. "
+					  "Ignoring link from %s\n",
+					  ldb_dn_get_linearized(dsdb_dn->dn),
+					  ldb_dn_get_linearized(source_dn)));
+
+			} else {
+				ldb_asprintf_errstring(ldb,
+						       "Deleted target %s GUID %s linked from %s",
+						       ldb_dn_get_linearized(dsdb_dn->dn),
+						       GUID_string(tmp_ctx, guid),
+						       ldb_dn_get_linearized(source_dn));
+				ret = LDB_ERR_NO_SUCH_OBJECT;
+			}
 		}
 	}
 
diff --git a/source4/dsdb/samdb/samdb.h b/source4/dsdb/samdb/samdb.h
index 2abaf4a..01eb1f3 100644
--- a/source4/dsdb/samdb/samdb.h
+++ b/source4/dsdb/samdb/samdb.h
@@ -65,6 +65,7 @@ struct dsdb_control_current_partition {
 #define DSDB_REPL_FLAG_ADD_NCNAME	   4
 #define DSDB_REPL_FLAG_EXPECT_NO_SECRETS   8
 #define DSDB_REPL_FLAG_OBJECT_SUBSET       16
+#define DSDB_REPL_FLAG_TARGETS_UPTODATE    32
 
 #define DSDB_CONTROL_REPLICATED_UPDATE_OID "1.3.6.1.4.1.7165.4.3.3"
 
diff --git a/source4/libnet/libnet_vampire.c b/source4/libnet/libnet_vampire.c
index d89256b..aa0ec91 100644
--- a/source4/libnet/libnet_vampire.c
+++ b/source4/libnet/libnet_vampire.c
@@ -647,6 +647,10 @@ WERROR libnet_vampire_cb_store_chunk(void *private_data,
 			is_exop = true;
 		}
 		req_replica_flags = c->req10->replica_flags;
+
+		if (c->req10->more_flags & DRSUAPI_DRS_GET_TGT) {
+			dsdb_repl_flags |= DSDB_REPL_FLAG_TARGETS_UPTODATE;
+		}
 		break;
 	default:
 		return WERR_INVALID_PARAMETER;
-- 
2.7.4


From 13b1c873b2fc5fed601cbf25823d47ffae99daa8 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Mon, 24 Jul 2017 16:20:58 +1200
Subject: [PATCH 11/36] replmd: Try to add forward-link for unknown
 cross-partition links

Previously Samba would just drop cross-partition links where the link
target object is unknown. Instead, what we want to do is try to add the
forward link for the GUID specified. We can't add the backlink because
we don't know the target, however, dbcheck should be able to fix any
missing backlinks.

The new behaviour should now mean dbcheck will detect the problem and be
able to fix it. It's still not ideal, but it's better than dropping the
link completely.

I've updated the log so that it has higher severity and tells the user
what they need to do to fix it.

These changes now mean that the selftests now detect an error - instead
of completely dropping the serverReference, we now have a missing
backlink. I've updated the selftests to fix up any missing
serverReference backlinks before running dbcheck.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 14 +++++++-------
 testprogs/blackbox/dbcheck.sh                   |  8 ++++++++
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index 228622c..d5234d5 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -374,8 +374,9 @@ static int replmd_process_backlink(struct ldb_module *module, struct la_backlink
 	ret = dsdb_module_dn_by_guid(module, frame, &bl->target_guid, &target_dn, parent);
 	if (ret != LDB_SUCCESS) {
 		struct GUID_txt_buf guid_str;
-		DEBUG(2,(__location__ ": WARNING: Failed to find target DN for linked attribute with GUID %s\n",
-			 GUID_buf_string(&bl->target_guid, &guid_str)));
+		DBG_WARNING("Failed to find target DN for linked attribute with GUID %s\n",
+			    GUID_buf_string(&bl->target_guid, &guid_str));
+		DBG_WARNING("Please run 'samba-tool dbcheck' to resolve any missing backlinks.\n");
 		talloc_free(frame);
 		return LDB_SUCCESS;
 	}
@@ -6816,15 +6817,14 @@ static int replmd_check_target_exists(struct ldb_module *module,
 		} else {
 
 			/*
-			 * TODO:
-			 * We don't handle cross-partition links well here (we
-			 * could potentially lose them), but don't fail the
-			 * replication.
+			 * The target of the cross-partition link is missing.
+			 * Continue and try to at least add the forward-link.
+			 * This isn't great, but if we can add a partial link
+			 * then it's better than nothing.
 			 */
 			DEBUG(2,("Failed to resolve cross-partition link between %s and %s\n",
 				 ldb_dn_get_linearized(source_dn),
 				 ldb_dn_get_linearized(dsdb_dn->dn)));
-			*ignore_link = true;
 		}
 	} else if (target_res->count != 1) {
 		ldb_asprintf_errstring(ldb, "More than one object found matching objectGUID %s\n",
diff --git a/testprogs/blackbox/dbcheck.sh b/testprogs/blackbox/dbcheck.sh
index 0f979ab..387ce70 100755
--- a/testprogs/blackbox/dbcheck.sh
+++ b/testprogs/blackbox/dbcheck.sh
@@ -27,6 +27,13 @@ dbcheck_fix_stale_links() {
 	$BINDIR/samba-tool dbcheck --quiet --fix --yes remove_plausible_deleted_DN_links --attrs="member msDS-NC-Replica-Locations msDS-NC-RO-Replica-Locations" --cross-ncs $ARGS
 }
 
+# This list of attributes can be freely extended
+dbcheck_fix_crosspartition_backlinks() {
+	# we may not know the target yet when we receive a cross-partition link,
+	# which can result in a missing backlink
+	$BINDIR/samba-tool dbcheck --quiet --fix --yes fix_all_missing_backlinks --attrs="serverReference" --cross-ncs $ARGS
+}
+
 # This test shows that this does not do anything to a current
 # provision (that would be a bug)
 dbcheck_reset_well_known_acls() {
@@ -47,6 +54,7 @@ force_modules() {
 
 dbcheck_fix_one_way_links
 dbcheck_fix_stale_links
+dbcheck_fix_crosspartition_backlinks
 testit "dbcheck" dbcheck
 testit "reindex" reindex
 testit "fixed_attrs" fixed_attrs
-- 
2.7.4


From a39ab36ed0dcfd26382f5ffe18c09051e6f7368e Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Tue, 6 Jun 2017 18:06:22 +1200
Subject: [PATCH 12/36] getncchanges.py: Add a new test for replication

This adds a new test to check that if objects are modified during a
replication, then those objects don't wind up missing from the
replication data.

Note that when this scenario occurs, samba returns the objects in a
different order to Windows. This test doesn't care what order the
replicated objects get returned in, so long as they all have been
received by the end of the test.

As part of this, I've refactored _check_replication() in drs_base.py so
it can be reused in new tests. In these cases, the objects are split up
over multiple different chunks. So asserting that the objects are returned
in a specific order makes it difficult to run the same test on both Samba
and Windows.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/selftest/tests.py                  |  10 ++
 source4/torture/drs/python/drs_base.py     |  70 ++++++++++----
 source4/torture/drs/python/getncchanges.py | 147 +++++++++++++++++++++++++++++
 3 files changed, 208 insertions(+), 19 deletions(-)
 create mode 100644 source4/torture/drs/python/getncchanges.py

diff --git a/source4/selftest/tests.py b/source4/selftest/tests.py
index 3f3d216..32094e0 100755
--- a/source4/selftest/tests.py
+++ b/source4/selftest/tests.py
@@ -851,6 +851,16 @@ for env in ['vampire_dc', 'promoted_dc', 'vampire_2000_dc']:
                            environ={'DC1': "$DC_SERVER", 'DC2': '$%s_SERVER' % env.upper()},
                            extra_args=['-U$DOMAIN/$DC_USERNAME%$DC_PASSWORD'])
 
+# A side-effect of the getncchanges tests is that they will create hundreds of
+# tombstone objects, so run them last to avoid interferring with (and slowing
+# down) the other DRS tests
+for env in ['vampire_dc', 'promoted_dc']:
+    planoldpythontestsuite(env, "getncchanges",
+			   extra_path=[os.path.join(samba4srcdir, 'torture/drs/python')],
+			   name="samba4.drs.getncchanges.python(%s)" % env,
+			   environ={'DC1': "$DC_SERVER", 'DC2': '$%s_SERVER' % env.upper()},
+			   extra_args=['-U$DOMAIN/$DC_USERNAME%$DC_PASSWORD'])
+
 for env in ['ad_dc_ntvfs']:
     planoldpythontestsuite(env, "repl_rodc",
                            extra_path=[os.path.join(samba4srcdir, 'torture/drs/python')],
diff --git a/source4/torture/drs/python/drs_base.py b/source4/torture/drs/python/drs_base.py
index 9e258bb..b378326 100644
--- a/source4/torture/drs/python/drs_base.py
+++ b/source4/torture/drs/python/drs_base.py
@@ -194,17 +194,15 @@ class DrsBaseTestCase(SambaToolCmdTest):
         id.dn = str(res[0].dn)
         return id
 
-    def _check_replication(self, expected_dns, replica_flags, expected_links=[],
-                           drs_error=drsuapi.DRSUAPI_EXOP_ERR_NONE, drs=None, drs_handle=None,
-                           highwatermark=None, uptodateness_vector=None,
-                           more_flags=0, more_data=False,
-                           dn_ordered=True, links_ordered=True,
-                           max_objects=133, exop=0,
-                           dest_dsa=drsuapi.DRSUAPI_DS_BIND_GUID_W2K3,
-                           source_dsa=None, invocation_id=None, nc_dn_str=None,
-                           nc_object_count=0, nc_linked_attributes_count=0):
+    def _get_replication(self, replica_flags,
+                          drs_error=drsuapi.DRSUAPI_EXOP_ERR_NONE, drs=None, drs_handle=None,
+                          highwatermark=None, uptodateness_vector=None,
+                          more_flags=0, max_objects=133, exop=0,
+                          dest_dsa=drsuapi.DRSUAPI_DS_BIND_GUID_W2K3,
+                          source_dsa=None, invocation_id=None, nc_dn_str=None):
         """
-        Makes sure that replication returns the specific error given.
+        Builds a DsGetNCChanges request based on the information provided
+        and returns the response received from the DC.
         """
         if source_dsa is None:
             source_dsa = self.ldb_dc1.get_ntds_GUID()
@@ -248,12 +246,51 @@ class DrsBaseTestCase(SambaToolCmdTest):
         self.assertEqual(level, 6, "expected level 6 response!")
         self.assertEqual(ctr.source_dsa_guid, misc.GUID(source_dsa))
         self.assertEqual(ctr.source_dsa_invocation_id, misc.GUID(invocation_id))
-        ctr6 = ctr
-        self.assertEqual(ctr6.extended_ret, drs_error)
+        self.assertEqual(ctr.extended_ret, drs_error)
+
+        return ctr
+
+    def _check_replication(self, expected_dns, replica_flags, expected_links=[],
+                           drs_error=drsuapi.DRSUAPI_EXOP_ERR_NONE, drs=None, drs_handle=None,
+                           highwatermark=None, uptodateness_vector=None,
+                           more_flags=0, more_data=False,
+                           dn_ordered=True, links_ordered=True,
+                           max_objects=133, exop=0,
+                           dest_dsa=drsuapi.DRSUAPI_DS_BIND_GUID_W2K3,
+                           source_dsa=None, invocation_id=None, nc_dn_str=None,
+                           nc_object_count=0, nc_linked_attributes_count=0):
+        """
+        Makes sure that replication returns the specific error given.
+        """
+
+        # send a DsGetNCChanges to the DC
+        ctr6 = self._get_replication(replica_flags,
+                                     drs_error, drs, drs_handle,
+                                     highwatermark, uptodateness_vector,
+                                     more_flags, max_objects, exop, dest_dsa,
+                                     source_dsa, invocation_id, nc_dn_str)
+
+        # check the response is what we expect
         self._check_ctr6(ctr6, expected_dns, expected_links,
-                         nc_object_count=nc_object_count)
+                         nc_object_count=nc_object_count, more_data=more_data,
+                         dn_ordered=dn_ordered)
         return (ctr6.new_highwatermark, ctr6.uptodateness_vector)
 
+
+    def _get_ctr6_dn_list(self, ctr6):
+        """
+        Returns the DNs contained in a DsGetNCChanges response.
+        """
+        dn_list = []
+        next_object = ctr6.first_object
+        for i in range(0, ctr6.object_count):
+            dn_list.append(next_object.object.identifier.dn)
+            next_object = next_object.next_object
+        self.assertEqual(next_object, None)
+
+        return dn_list
+
+
     def _check_ctr6(self, ctr6, expected_dns=[], expected_links=[],
                     dn_ordered=True, links_ordered=True,
                     more_data=False, nc_object_count=0,
@@ -268,12 +305,7 @@ class DrsBaseTestCase(SambaToolCmdTest):
         self.assertEqual(ctr6.nc_linked_attributes_count, nc_linked_attributes_count)
         self.assertEqual(ctr6.drs_error[0], drs_error)
 
-        ctr6_dns = []
-        next_object = ctr6.first_object
-        for i in range(0, ctr6.object_count):
-            ctr6_dns.append(next_object.object.identifier.dn)
-            next_object = next_object.next_object
-        self.assertEqual(next_object, None)
+        ctr6_dns = self._get_ctr6_dn_list(ctr6)
 
         i = 0
         for dn in expected_dns:
diff --git a/source4/torture/drs/python/getncchanges.py b/source4/torture/drs/python/getncchanges.py
new file mode 100644
index 0000000..d1d6b2b
--- /dev/null
+++ b/source4/torture/drs/python/getncchanges.py
@@ -0,0 +1,147 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+#
+# Tests various schema replication scenarios
+#
+# Copyright (C) Catalyst.Net Ltd. 2017
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+#
+# Usage:
+#  export DC1=dc1_dns_name
+#  export DC2=dc2_dns_name
+#  export SUBUNITRUN=$samba4srcdir/scripting/bin/subunitrun
+#  PYTHONPATH="$PYTHONPATH:$samba4srcdir/torture/drs/python" $SUBUNITRUN getncchanges -U"$DOMAIN/$DC_USERNAME"%"$DC_PASSWORD"
+#
+
+import drs_base
+import samba.tests
+import ldb
+from ldb import SCOPE_BASE
+
+from samba.dcerpc import drsuapi
+
+class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
+    def setUp(self):
+        super(DrsReplicaSyncIntegrityTestCase, self).setUp()
+        self.base_dn = self.ldb_dc1.get_default_basedn()
+        self.ou = "OU=uptodateness_test,%s" % self.base_dn
+        self.ldb_dc1.add({
+            "dn": self.ou,
+            "objectclass": "organizationalUnit"})
+        (self.drs, self.drs_handle) = self._ds_bind(self.dnsname_dc1)
+        (self.default_hwm, self.default_utdv) = self._get_highest_hwm_utdv(self.ldb_dc1)
+        self._debug = True
+
+    def tearDown(self):
+        super(DrsReplicaSyncIntegrityTestCase, self).tearDown()
+        # tidyup groups and users
+        try:
+            self.ldb_dc1.delete(self.ou, ["tree_delete:1"])
+        except ldb.LdbError as (enum, string):
+            if enum == ldb.ERR_NO_SUCH_OBJECT:
+                pass
+
+    def add_object(self, dn):
+        """Adds an OU object"""
+        self.ldb_dc1.add({"dn": dn, "objectclass": "organizationalunit"})
+        res = self.ldb_dc1.search(base=dn, scope=SCOPE_BASE)
+        self.assertEquals(len(res), 1)
+
+    def modify_object(self, dn, attr, value):
+        """Modifies an object's USN by adding an attribute value to it"""
+        m = ldb.Message()
+        m.dn = ldb.Dn(self.ldb_dc1, dn)
+        m[attr] = ldb.MessageElement(value, ldb.FLAG_MOD_ADD, attr)
+        self.ldb_dc1.modify(m)
+
+    def create_object_range(self, start, end, prefix=""):
+        """
+        Creates a block of objects. Object names are numbered sequentially,
+        using the optional prefix supplied.
+        """
+        dn_list = []
+
+        # Create the parents first, then the children.
+        # This makes it easier to see in debug when GET_ANC takes effect
+        # because the parent/children become interleaved (by default,
+        # this approach means the objects are organized into blocks of
+        # parents and blocks of children together)
+        for x in range(start, end):
+            ou = "OU=test_ou_%s%d,%s" % (prefix, x, self.ou)
+            self.add_object(ou)
+            dn_list.append(ou)
+
+        return dn_list
+
+    def assert_expected_data(self, received_list, expected_list):
+        """
+        Asserts that we received all the DNs that we expected and
+        none are missing.
+        """
+
+        # Note that with GET_ANC Windows can end up sending the same parent
+        # object multiple times, so this might be noteworthy but doesn't
+        # warrant failing the test
+        if (len(received_list) != len(expected_list)):
+            print("Note: received %d objects but expected %d" %(len(received_list),
+                                                                len(expected_list)))
+
+        # Check that we received every object that we were expecting
+        for dn in expected_list:
+            self.assertTrue(dn in received_list, "DN '%s' missing from replication." % dn)
+
+    def test_repl_integrity(self):
+        """
+        Modify the objects being replicated while the replication is still
+        in progress and check that no object loss occurs.
+        """
+
+        # The server behaviour differs between samba and Windows. Samba returns
+        # the objects in the original order (up to the pre-modify HWM). Windows
+        # incorporates the modified objects and returns them in the new order
+        # (i.e. modified objects last), up to the post-modify HWM. The Microsoft
+        # docs state the Windows behaviour is optional.
+
+        # Create a range of objects to replicate.
+        expected_dn_list = self.create_object_range(0, 400)
+        (orig_hwm, unused) = self._get_highest_hwm_utdv(self.ldb_dc1)
+
+        # We ask for the first page of 100 objects.
+        # For this test, we don't care what order we receive the objects in,
+        # so long as by the end we've received everything
+        rxd_dn_list = []
+        ctr6 = self._get_replication(drsuapi.DRSUAPI_DRS_WRIT_REP, max_objects=100)
+        rxd_dn_list = self._get_ctr6_dn_list(ctr6)
+
+        # Modify some of the second page of objects. This should bump the highwatermark
+        for x in range(100, 200):
+            self.modify_object(expected_dn_list[x], "displayName", "OU%d" % x)
+
+        (post_modify_hwm, unused) = self._get_highest_hwm_utdv(self.ldb_dc1)
+        self.assertTrue(post_modify_hwm.highest_usn > orig_hwm.highest_usn)
+
+        # Get the remaining blocks of data
+        while ctr6.more_data:
+            ctr6 = self._get_replication(drsuapi.DRSUAPI_DRS_WRIT_REP, max_objects=100,
+                                         highwatermark=ctr6.new_highwatermark,
+                                         uptodateness_vector=ctr6.uptodateness_vector)
+            rxd_dn_list += self._get_ctr6_dn_list(ctr6)
+
+        # Check we still receive all the objects we're expecting
+        self.assert_expected_data(rxd_dn_list, expected_dn_list)
+
+
-- 
2.7.4


From 344a754673ed971b85d85c900e96548de8b4c20d Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Tue, 6 Jun 2017 18:21:40 +1200
Subject: [PATCH 13/36] getncchanges.py: Add GET_ANC replication test case

This test:
- creates blocks of parent/child objects
- modifies the parents, so the child gets received first in the
  replication (which means the client has to use GET_ANC)
- checks that we always receive the parent before the child (if not, it
  either retries with GET_ANC, or asserts if GET_ANC is already set)
- modifies the parent objects to change their USN while the
  replication is in progress
- checks that all expected objects are received by the end of the
  test

I've added a repl_get_next() function to help simulate a client's
behaviour - if it encounters an object it doesn't know the parent of,
then it retries with GET_ANC.

Also added some debug to drs_base.py that developers can turn on to make
it easier to see what objects we're actually receiving in the
responses.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/torture/drs/python/drs_base.py     |  25 ++++
 source4/torture/drs/python/getncchanges.py | 195 +++++++++++++++++++++++++++--
 2 files changed, 212 insertions(+), 8 deletions(-)

diff --git a/source4/torture/drs/python/drs_base.py b/source4/torture/drs/python/drs_base.py
index b378326..b2df181 100644
--- a/source4/torture/drs/python/drs_base.py
+++ b/source4/torture/drs/python/drs_base.py
@@ -68,6 +68,9 @@ class DrsBaseTestCase(SambaToolCmdTest):
         self.dnsname_dc1 = self.info_dc1["dnsHostName"][0]
         self.dnsname_dc2 = self.info_dc2["dnsHostName"][0]
 
+        # for debugging the test code
+        self._debug = False
+
     def tearDown(self):
         super(DrsBaseTestCase, self).tearDown()
 
@@ -194,6 +197,27 @@ class DrsBaseTestCase(SambaToolCmdTest):
         id.dn = str(res[0].dn)
         return id
 
+    def _ctr6_debug(self, ctr6):
+        """
+        Displays basic info contained in a DsGetNCChanges response.
+        Having this debug code allows us to see the difference in behaviour
+        between Samba and Windows easier. Turn on the self._debug flag to see it.
+        """
+
+        if self._debug:
+            print("------------ recvd CTR6 -------------")
+
+            next_object = ctr6.first_object
+            for i in range(0, ctr6.object_count):
+                print("Obj %d: %s %s" %(i, next_object.object.identifier.dn[:22],
+                                        next_object.object.identifier.guid))
+                next_object = next_object.next_object
+
+            print("Linked Attributes: %d" % ctr6.linked_attributes_count)
+            print("HWM:     %d" %(ctr6.new_highwatermark.highest_usn))
+            print("Tmp HWM: %d" %(ctr6.new_highwatermark.tmp_highest_usn))
+            print("More data: %d" %(ctr6.more_data))
+
     def _get_replication(self, replica_flags,
                           drs_error=drsuapi.DRSUAPI_EXOP_ERR_NONE, drs=None, drs_handle=None,
                           highwatermark=None, uptodateness_vector=None,
@@ -242,6 +266,7 @@ class DrsBaseTestCase(SambaToolCmdTest):
             uptodateness_vector_v1.cursors = cursors
             req10.uptodateness_vector = uptodateness_vector_v1
         (level, ctr) = drs.DsGetNCChanges(drs_handle, 10, req10)
+        self._ctr6_debug(ctr)
 
         self.assertEqual(level, 6, "expected level 6 response!")
         self.assertEqual(ctr.source_dsa_guid, misc.GUID(source_dsa))
diff --git a/source4/torture/drs/python/getncchanges.py b/source4/torture/drs/python/getncchanges.py
index d1d6b2b..2f914d8 100644
--- a/source4/torture/drs/python/getncchanges.py
+++ b/source4/torture/drs/python/getncchanges.py
@@ -44,7 +44,14 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
             "objectclass": "organizationalUnit"})
         (self.drs, self.drs_handle) = self._ds_bind(self.dnsname_dc1)
         (self.default_hwm, self.default_utdv) = self._get_highest_hwm_utdv(self.ldb_dc1)
-        self._debug = True
+
+        # 100 is the minimum max_objects that Microsoft seems to honour
+        # (the max honoured is 400ish), so we use that in these tests
+        self.max_objects = 100
+        self.last_ctr = None
+
+        # store whether we used GET_ANC flags in the requests
+        self.used_get_anc = False
 
     def tearDown(self):
         super(DrsReplicaSyncIntegrityTestCase, self).tearDown()
@@ -68,13 +75,23 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         m[attr] = ldb.MessageElement(value, ldb.FLAG_MOD_ADD, attr)
         self.ldb_dc1.modify(m)
 
-    def create_object_range(self, start, end, prefix=""):
+    def create_object_range(self, start, end, prefix="",
+                            children=None, parent_list=None):
         """
         Creates a block of objects. Object names are numbered sequentially,
-        using the optional prefix supplied.
+        using the optional prefix supplied. If the children parameter is
+        supplied it will create a parent-child hierarchy and return the
+        top-level parents separately.
         """
         dn_list = []
 
+        # Use dummy/empty lists if we're not creating a parent/child hierarchy
+        if children is None:
+            children = []
+
+        if parent_list is None:
+            parent_list = []
+
         # Create the parents first, then the children.
         # This makes it easier to see in debug when GET_ANC takes effect
         # because the parent/children become interleaved (by default,
@@ -85,6 +102,16 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
             self.add_object(ou)
             dn_list.append(ou)
 
+            # keep track of the top-level parents (if needed)
+            parent_list.append(ou)
+
+        # create the block of children (if needed)
+        for x in range(start, end):
+            for child in children:
+                ou = "OU=test_ou_child%s%d,%s" % (child, x, parent_list[x])
+                self.add_object(ou)
+                dn_list.append(ou)
+
         return dn_list
 
     def assert_expected_data(self, received_list, expected_list):
@@ -124,7 +151,7 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # For this test, we don't care what order we receive the objects in,
         # so long as by the end we've received everything
         rxd_dn_list = []
-        ctr6 = self._get_replication(drsuapi.DRSUAPI_DRS_WRIT_REP, max_objects=100)
+        ctr6 = self.repl_get_next(rxd_dn_list)
         rxd_dn_list = self._get_ctr6_dn_list(ctr6)
 
         # Modify some of the second page of objects. This should bump the highwatermark
@@ -135,13 +162,165 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         self.assertTrue(post_modify_hwm.highest_usn > orig_hwm.highest_usn)
 
         # Get the remaining blocks of data
-        while ctr6.more_data:
-            ctr6 = self._get_replication(drsuapi.DRSUAPI_DRS_WRIT_REP, max_objects=100,
-                                         highwatermark=ctr6.new_highwatermark,
-                                         uptodateness_vector=ctr6.uptodateness_vector)
+        while not self.replication_complete():
+            ctr6 = self.repl_get_next(rxd_dn_list)
             rxd_dn_list += self._get_ctr6_dn_list(ctr6)
 
         # Check we still receive all the objects we're expecting
         self.assert_expected_data(rxd_dn_list, expected_dn_list)
 
+    def is_parent_known(self, dn, known_dn_list):
+        """
+        Returns True if the parent of the dn specified is in known_dn_list
+        """
+
+        # we can sometimes get system objects like the RID Manager returned.
+        # Ignore anything that is not under the test OU we created
+        if self.ou not in dn:
+            return True
+
+        # Remove the child portion from the name to get the parent's DN
+        name_substrings = dn.split(",")
+        del name_substrings[0]
+
+        parent_dn = ",".join(name_substrings)
+
+        # check either this object is a parent (it's parent is the top-level
+        # test object), or its parent has been seen previously
+        return parent_dn == self.ou or parent_dn in known_dn_list
+
+    def repl_get_next(self, initial_objects, get_anc=False):
+        """
+        Requests the next block of replication data. This tries to simulate
+        client behaviour - if we receive a replicated object that we don't know
+        the parent of, then re-request the block with the GET_ANC flag set.
+        """
+
+        # we're just trying to mimic regular client behaviour here, so just
+        # use the highwatermark in the last response we received
+        if self.last_ctr:
+            highwatermark = self.last_ctr.new_highwatermark
+            uptodateness_vector = self.last_ctr.uptodateness_vector
+        else:
+            # this is the initial replication, so we're starting from the start
+            highwatermark = None
+            uptodateness_vector = None
+
+        # we'll add new objects as we discover them, so take a copy to modify
+        known_objects = initial_objects[:]
+
+        # Ask for the next block of replication data
+        replica_flags = drsuapi.DRSUAPI_DRS_WRIT_REP
+
+        if get_anc:
+            replica_flags = drsuapi.DRSUAPI_DRS_WRIT_REP | drsuapi.DRSUAPI_DRS_GET_ANC
+            self.used_get_anc = True
+
+        ctr6 = self._get_replication(replica_flags,
+                                     max_objects=self.max_objects,
+                                     highwatermark=highwatermark,
+                                     uptodateness_vector=uptodateness_vector)
+
+        # check that we know the parent for every object received
+        rxd_dn_list = self._get_ctr6_dn_list(ctr6)
+
+        for i in range(0, len(rxd_dn_list)):
+
+            dn = rxd_dn_list[i]
+
+            if self.is_parent_known(dn, known_objects):
+
+                # the new DN is now known so add it to the list.
+                # It may be the parent of another child in this block
+                known_objects.append(dn)
+            else:
+                # If we've already set the GET_ANC flag then it should mean
+                # we receive the parents before the child
+                self.assertFalse(get_anc, "Unknown parent for object %s" % dn)
+
+                print("Unknown parent for %s - try GET_ANC" % dn)
+
+                # try the same thing again with the GET_ANC flag set this time
+                return self.repl_get_next(get_anc=True)
+
+        # store the last successful result so we know what HWM to request next
+        self.last_ctr = ctr6
+
+        return ctr6
+
+    def replication_complete(self):
+        """Returns True if the current/last replication cycle is complete"""
+
+        if self.last_ctr is None or self.last_ctr.more_data:
+            return False
+        else:
+            return True
+
+    def test_repl_integrity_get_anc(self):
+        """
+        Modify the parent objects being replicated while the replication is still
+        in progress (using GET_ANC) and check that no object loss occurs.
+        """
+
+        # Note that GET_ANC behaviour varies between Windows and Samba.
+        # On Samba GET_ANC results in the replication restarting from the very
+        # beginning. After that, Samba remembers GET_ANC and also sends the
+        # parents in subsequent requests (regardless of whether GET_ANC is
+        # specified in the later request).
+        # Windows only sends the parents if GET_ANC was specified in the last
+        # request. It will also resend a parent, even if it's already sent the
+        # parent in a previous response (whereas Samba doesn't).
+
+        # Create a small block of 50 parents, each with 2 children (A and B)
+        # This is so that we receive some children in the first block, so we
+        # can resend with GET_ANC before we learn too many parents
+        parent_dn_list = []
+        expected_dn_list = self.create_object_range(0, 50, prefix="parent",
+                                                    children=("A", "B"),
+                                                    parent_list=parent_dn_list)
+
+        # create the remaining parents and children
+        expected_dn_list += self.create_object_range(50, 150, prefix="parent",
+                                                     children=("A", "B"),
+                                                     parent_list=parent_dn_list)
+
+        # We've now got objects in the following order:
+        # [50 parents][100 children][100 parents][200 children]
+
+        # Modify the first parent so that it's now ordered last by USN
+        # This means we set the GET_ANC flag pretty much straight away
+        # because we receive the first child before the first parent
+        self.modify_object(parent_dn_list[0], "displayName", "OU0")
+
+        # modify a later block of parents so they also get reordered
+        for x in range(50, 100):
+            self.modify_object(parent_dn_list[x], "displayName", "OU%d" % x)
+
+        # Get the first block of objects - this should resend the request with
+        # GET_ANC set because we won't know about the first child's parent.
+        # On samba GET_ANC essentially starts the sync from scratch again, so
+        # we get this over with early before we learn too many parents
+        rxd_dn_list = []
+        ctr6 = self.repl_get_next(rxd_dn_list)
+        rxd_dn_list = self._get_ctr6_dn_list(ctr6)
+
+        # modify the last chunk of parents. They should now have a USN higher
+        # than the highwater-mark for the replication cycle
+        for x in range(100, 150):
+            self.modify_object(parent_dn_list[x], "displayName", "OU%d" % x)
+
+        # Get the remaining blocks of data - this will resend the request with
+        # GET_ANC if it encounters an object it doesn't have the parent for.
+        while not self.replication_complete():
+            ctr6 = self.repl_get_next(rxd_dn_list)
+            rxd_dn_list += self._get_ctr6_dn_list(ctr6)
+
+        # The way the test objects have been created should force
+        # self.repl_get_next() to use the GET_ANC flag. If this doesn't
+        # actually happen, then the test isn't doing its job properly
+        self.assertTrue(self.used_get_anc,
+                        "Test didn't use the GET_ANC flag as expected")
+
+        # Check we get all the objects we're expecting
+        self.assert_expected_data(rxd_dn_list, expected_dn_list)
 
-- 
2.7.4


From c687b72ee15585dbd5b8d3d5255a5bba5fa852d6 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Wed, 12 Jul 2017 10:16:00 +1200
Subject: [PATCH 14/36] getncchanges.py: Add test for GET_ANC and linked
 attributes

Add a basic test that when we use GET_ANC and the parents have linked
attributes, then we receive all the expected links and all the expected
objects by the end of the test.

This extends the test code to track what linked attributes get received
and check whether they match what's present on the DC.

Also made some minor cleanups to store the received objects/links each
time we successfully receive a GETNCChanges response (this saves the
test case having to repeat this code every time).

Note that although this test involves linked attributes, it shouldn't
exercise the GET_TGT case at all.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/torture/drs/python/drs_base.py     |  46 ++++++---
 source4/torture/drs/python/getncchanges.py | 150 ++++++++++++++++++++++++-----
 2 files changed, 157 insertions(+), 39 deletions(-)

diff --git a/source4/torture/drs/python/drs_base.py b/source4/torture/drs/python/drs_base.py
index b2df181..b19c51a 100644
--- a/source4/torture/drs/python/drs_base.py
+++ b/source4/torture/drs/python/drs_base.py
@@ -197,6 +197,27 @@ class DrsBaseTestCase(SambaToolCmdTest):
         id.dn = str(res[0].dn)
         return id
 
+    def _get_ctr6_links(self, ctr6):
+        """
+        Unpacks the linked attributes from a DsGetNCChanges response
+        and returns them as a list.
+        """
+        ctr6_links = []
+        for lidx in range(0, ctr6.linked_attributes_count):
+            l = ctr6.linked_attributes[lidx]
+            try:
+                target = ndr_unpack(drsuapi.DsReplicaObjectIdentifier3,
+                                    l.value.blob)
+            except:
+                target = ndr_unpack(drsuapi.DsReplicaObjectIdentifier3Binary,
+                                    l.value.blob)
+            al = AbstractLink(l.attid, l.flags,
+                              l.identifier.guid,
+                              target.guid, target.dn)
+            ctr6_links.append(al)
+
+        return ctr6_links
+
     def _ctr6_debug(self, ctr6):
         """
         Displays basic info contained in a DsGetNCChanges response.
@@ -214,6 +235,11 @@ class DrsBaseTestCase(SambaToolCmdTest):
                 next_object = next_object.next_object
 
             print("Linked Attributes: %d" % ctr6.linked_attributes_count)
+            ctr6_links = self._get_ctr6_links(ctr6)
+            for link in ctr6_links:
+                print("Link Tgt %s... <-- Src %s"
+                      %(link.targetDN[:22], link.identifier))
+
             print("HWM:     %d" %(ctr6.new_highwatermark.highest_usn))
             print("Tmp HWM: %d" %(ctr6.new_highwatermark.tmp_highest_usn))
             print("More data: %d" %(ctr6.more_data))
@@ -343,21 +369,9 @@ class DrsBaseTestCase(SambaToolCmdTest):
             else:
                 self.assertTrue(dn in ctr6_dns, "Couldn't find DN '%s' anywhere in ctr6 response." % dn)
 
-        ctr6_links = []
+        # Extract the links from the response
+        ctr6_links = self._get_ctr6_links(ctr6)
         expected_links.sort()
-        lidx = 0
-        for lidx in range(0, ctr6.linked_attributes_count):
-            l = ctr6.linked_attributes[lidx]
-            try:
-                target = ndr_unpack(drsuapi.DsReplicaObjectIdentifier3,
-                                    l.value.blob)
-            except:
-                target = ndr_unpack(drsuapi.DsReplicaObjectIdentifier3Binary,
-                                    l.value.blob)
-            al = AbstractLink(l.attid, l.flags,
-                              l.identifier.guid,
-                              target.guid)
-            ctr6_links.append(al)
 
         lidx = 0
         for el in expected_links:
@@ -438,13 +452,15 @@ class DrsBaseTestCase(SambaToolCmdTest):
 
 
 class AbstractLink:
-    def __init__(self, attid, flags, identifier, targetGUID):
+    def __init__(self, attid, flags, identifier, targetGUID,
+                 targetDN=""):
         self.attid = attid
         self.flags = flags
         self.identifier = str(identifier)
         self.selfGUID_blob = ndr_pack(identifier)
         self.targetGUID = str(targetGUID)
         self.targetGUID_blob = ndr_pack(targetGUID)
+        self.targetDN = targetDN
 
     def __repr__(self):
         return "AbstractLink(0x%08x, 0x%08x, %s, %s)" % (
diff --git a/source4/torture/drs/python/getncchanges.py b/source4/torture/drs/python/getncchanges.py
index 2f914d8..7d48133 100644
--- a/source4/torture/drs/python/getncchanges.py
+++ b/source4/torture/drs/python/getncchanges.py
@@ -45,6 +45,9 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         (self.drs, self.drs_handle) = self._ds_bind(self.dnsname_dc1)
         (self.default_hwm, self.default_utdv) = self._get_highest_hwm_utdv(self.ldb_dc1)
 
+        self.rxd_dn_list = []
+        self.rxd_links = []
+
         # 100 is the minimum max_objects that Microsoft seems to honour
         # (the max honoured is 400ish), so we use that in these tests
         self.max_objects = 100
@@ -114,11 +117,12 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
 
         return dn_list
 
-    def assert_expected_data(self, received_list, expected_list):
+    def assert_expected_data(self, expected_list):
         """
         Asserts that we received all the DNs that we expected and
         none are missing.
         """
+        received_list = self.rxd_dn_list
 
         # Note that with GET_ANC Windows can end up sending the same parent
         # object multiple times, so this might be noteworthy but doesn't
@@ -150,9 +154,7 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # We ask for the first page of 100 objects.
         # For this test, we don't care what order we receive the objects in,
         # so long as by the end we've received everything
-        rxd_dn_list = []
-        ctr6 = self.repl_get_next(rxd_dn_list)
-        rxd_dn_list = self._get_ctr6_dn_list(ctr6)
+        self.repl_get_next()
 
         # Modify some of the second page of objects. This should bump the highwatermark
         for x in range(100, 200):
@@ -163,11 +165,10 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
 
         # Get the remaining blocks of data
         while not self.replication_complete():
-            ctr6 = self.repl_get_next(rxd_dn_list)
-            rxd_dn_list += self._get_ctr6_dn_list(ctr6)
+            self.repl_get_next()
 
         # Check we still receive all the objects we're expecting
-        self.assert_expected_data(rxd_dn_list, expected_dn_list)
+        self.assert_expected_data(expected_dn_list)
 
     def is_parent_known(self, dn, known_dn_list):
         """
@@ -189,12 +190,8 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # test object), or its parent has been seen previously
         return parent_dn == self.ou or parent_dn in known_dn_list
 
-    def repl_get_next(self, initial_objects, get_anc=False):
-        """
-        Requests the next block of replication data. This tries to simulate
-        client behaviour - if we receive a replicated object that we don't know
-        the parent of, then re-request the block with the GET_ANC flag set.
-        """
+    def _repl_send_request(self, get_anc=False):
+        """Sends a GetNCChanges request for the next block of replication data."""
 
         # we're just trying to mimic regular client behaviour here, so just
         # use the highwatermark in the last response we received
@@ -202,13 +199,10 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
             highwatermark = self.last_ctr.new_highwatermark
             uptodateness_vector = self.last_ctr.uptodateness_vector
         else:
-            # this is the initial replication, so we're starting from the start
+            # this is the first replication chunk
             highwatermark = None
             uptodateness_vector = None
 
-        # we'll add new objects as we discover them, so take a copy to modify
-        known_objects = initial_objects[:]
-
         # Ask for the next block of replication data
         replica_flags = drsuapi.DRSUAPI_DRS_WRIT_REP
 
@@ -216,14 +210,30 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
             replica_flags = drsuapi.DRSUAPI_DRS_WRIT_REP | drsuapi.DRSUAPI_DRS_GET_ANC
             self.used_get_anc = True
 
-        ctr6 = self._get_replication(replica_flags,
+        # return the response from the DC
+        return self._get_replication(replica_flags,
                                      max_objects=self.max_objects,
                                      highwatermark=highwatermark,
                                      uptodateness_vector=uptodateness_vector)
 
+    def repl_get_next(self, get_anc=False):
+        """
+        Requests the next block of replication data. This tries to simulate
+        client behaviour - if we receive a replicated object that we don't know
+        the parent of, then re-request the block with the GET_ANC flag set.
+        """
+
+        # send a request to the DC and get the response
+        ctr6 = self._repl_send_request(get_anc=get_anc)
+
         # check that we know the parent for every object received
         rxd_dn_list = self._get_ctr6_dn_list(ctr6)
 
+        # we'll add new objects as we discover them, so take a copy of the
+        # ones we already know about, so we can modify the list safely
+        known_objects = self.rxd_dn_list[:]
+
+        # check that we know the parent for every object received
         for i in range(0, len(rxd_dn_list)):
 
             dn = rxd_dn_list[i]
@@ -246,6 +256,10 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # store the last successful result so we know what HWM to request next
         self.last_ctr = ctr6
 
+        # store the objects and links we received
+        self.rxd_dn_list += self._get_ctr6_dn_list(ctr6)
+        self.rxd_links += self._get_ctr6_links(ctr6)
+
         return ctr6
 
     def replication_complete(self):
@@ -300,9 +314,7 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # GET_ANC set because we won't know about the first child's parent.
         # On samba GET_ANC essentially starts the sync from scratch again, so
         # we get this over with early before we learn too many parents
-        rxd_dn_list = []
-        ctr6 = self.repl_get_next(rxd_dn_list)
-        rxd_dn_list = self._get_ctr6_dn_list(ctr6)
+        self.repl_get_next()
 
         # modify the last chunk of parents. They should now have a USN higher
         # than the highwater-mark for the replication cycle
@@ -312,8 +324,7 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # Get the remaining blocks of data - this will resend the request with
         # GET_ANC if it encounters an object it doesn't have the parent for.
         while not self.replication_complete():
-            ctr6 = self.repl_get_next(rxd_dn_list)
-            rxd_dn_list += self._get_ctr6_dn_list(ctr6)
+            self.repl_get_next()
 
         # The way the test objects have been created should force
         # self.repl_get_next() to use the GET_ANC flag. If this doesn't
@@ -322,5 +333,96 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
                         "Test didn't use the GET_ANC flag as expected")
 
         # Check we get all the objects we're expecting
-        self.assert_expected_data(rxd_dn_list, expected_dn_list)
+        self.assert_expected_data(expected_dn_list)
+
+    def assert_expected_links(self, objects_with_links, link_attr="managedBy"):
+        """
+        Asserts that a GetNCChanges response contains any expected links
+        for the objects it contains.
+        """
+        received_links = self.rxd_links
+
+        num_expected = len(objects_with_links)
+
+        self.assertTrue(len(received_links) == num_expected,
+                        "Received %d links but expected %d"
+                        %(len(received_links), num_expected))
+
+        for dn in objects_with_links:
+            self.assert_object_has_link(dn, link_attr, received_links)
+
+    def assert_object_has_link(self, dn, link_attr, received_links):
+        """
+        Queries the object in the DB and asserts there is a link in the
+        GetNCChanges response that matches.
+        """
+
+        # Look up the link attribute in the DB
+        # The extended_dn option will dump the GUID info for the link
+        # attribute (as a hex blob)
+        res = self.ldb_dc1.search(ldb.Dn(self.ldb_dc1, dn), attrs=[link_attr],
+                                  controls=['extended_dn:1:0'], scope=ldb.SCOPE_BASE)
+
+        # We didn't find the expected link attribute in the DB for the object.
+        # Something has gone wrong somewhere...
+        self.assertTrue(link_attr in res[0], "%s in DB doesn't have attribute %s"
+                        %(dn, link_attr))
+
+        # find the received link in the list and assert that the target and
+        # source GUIDs match what's in the DB
+        for val in res[0][link_attr]:
+            # Work out the expected source and target GUIDs for the DB link
+            target_dn = ldb.Dn(self.ldb_dc1, val)
+            targetGUID_blob = target_dn.get_extended_component("GUID")
+            sourceGUID_blob = res[0].dn.get_extended_component("GUID")
+
+            found = False
+
+            for link in received_links:
+                if link.selfGUID_blob == sourceGUID_blob and \
+                   link.targetGUID_blob == targetGUID_blob:
+
+                    found = True
+
+                    if self._debug:
+                        print("Link %s --> %s" %(dn[:25], link.targetDN[:25]))
+                    break
+
+            self.assertTrue(found, "Did not receive expected link for DN %s" % dn)
+
+    def test_repl_get_anc_link_attr(self):
+        """
+        A basic GET_ANC test where the parents have linked attributes
+        """
+
+        # Create a block of 100 parents and 100 children
+        parent_dn_list = []
+        expected_dn_list = self.create_object_range(0, 100, prefix="parent",
+                                                    children=("A"),
+                                                    parent_list=parent_dn_list)
+
+        # Add links from the parents to the children
+        for x in range(0, 100):
+            self.modify_object(parent_dn_list[x], "managedBy", expected_dn_list[x + 100])
+
+        # add some filler objects at the end. This allows us to easily see
+        # which chunk the links get sent in
+        expected_dn_list += self.create_object_range(0, 100, prefix="filler")
+
+        # We've now got objects in the following order:
+        # [100 x children][100 x parents][100 x filler]
+
+        # Get the replication data - because the block of children come first,
+        # this should retry the request with GET_ANC
+        while not self.replication_complete():
+            self.repl_get_next()
+
+        self.assertTrue(self.used_get_anc,
+                        "Test didn't use the GET_ANC flag as expected")
+
+        # Check we get all the objects we're expecting
+        self.assert_expected_data(expected_dn_list)
+
+        # Check we received links for all the parents
+        self.assert_expected_links(parent_dn_list)
 
-- 
2.7.4


From ae16f063bc06144e509040cc0dfd9c04f5c74882 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Fri, 2 Jun 2017 14:42:34 +1200
Subject: [PATCH 15/36] getncchanges.c: Rename anc_cache to obj_cache

When we add GET_TGT support we will reuse the ancestor cache and it
should work the same way - if we've already sent an object because it
was needed for resolving a child object or a link target, then there's
no point sending it again.

This just renames anc_cache --> obj_cache.

Also removed a seemingly redundant check - dcesrv_drsuapi_obj_cache_exists()
doesn't return WERR_SUCCESS so there's no point checking for it.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/rpc_server/drsuapi/getncchanges.c | 44 +++++++++++++++++++------------
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c
index 096162d..0487544 100644
--- a/source4/rpc_server/drsuapi/getncchanges.c
+++ b/source4/rpc_server/drsuapi/getncchanges.c
@@ -44,13 +44,14 @@
 
 /* state of a partially completed getncchanges call */
 struct drsuapi_getncchanges_state {
-	struct db_context *anc_cache;
+	struct db_context *obj_cache;
 	struct GUID *guids;
 	uint32_t num_records;
 	uint32_t num_processed;
 	struct ldb_dn *ncRoot_dn;
 	struct GUID ncRoot_guid;
 	bool is_schema_nc;
+	bool is_get_anc;
 	uint64_t min_usn;
 	uint64_t max_usn;
 	struct drsuapi_DsReplicaHighWaterMark last_hwm;
@@ -1932,7 +1933,12 @@ static void dcesrv_drsuapi_update_highwatermark(const struct ldb_message *msg,
 	hwm->reserved_usn = 0;
 }
 
-static WERROR dcesrv_drsuapi_anc_cache_add(struct db_context *anc_cache,
+/**
+ * Adds an object's GUID to the cache of objects already sent.
+ * This avoids us sending the same object multiple times when
+ * the GetNCChanges request uses a flag like GET_ANC.
+ */
+static WERROR dcesrv_drsuapi_obj_cache_add(struct db_context *obj_cache,
 					   const struct GUID *guid)
 {
 	enum ndr_err_code ndr_err;
@@ -1957,7 +1963,7 @@ static WERROR dcesrv_drsuapi_anc_cache_add(struct db_context *anc_cache,
 		return WERR_DS_DRA_INTERNAL_ERROR;
 	}
 
-	status = dbwrap_store(anc_cache, key, val, TDB_REPLACE);
+	status = dbwrap_store(obj_cache, key, val, TDB_REPLACE);
 	if (!NT_STATUS_IS_OK(status)) {
 		return WERR_DS_DRA_INTERNAL_ERROR;
 	}
@@ -1965,7 +1971,11 @@ static WERROR dcesrv_drsuapi_anc_cache_add(struct db_context *anc_cache,
 	return WERR_OK;
 }
 
-static WERROR dcesrv_drsuapi_anc_cache_exists(struct db_context *anc_cache,
+/**
+ * Checks if the object with the GUID specified already exists in the
+ * object cache, i.e. it's already been sent in a GetNCChanges response.
+ */
+static WERROR dcesrv_drsuapi_obj_cache_exists(struct db_context *obj_cache,
 					      const struct GUID *guid)
 {
 	enum ndr_err_code ndr_err;
@@ -1986,7 +1996,7 @@ static WERROR dcesrv_drsuapi_anc_cache_exists(struct db_context *anc_cache,
 		return WERR_DS_DRA_INTERNAL_ERROR;
 	}
 
-	exists = dbwrap_exists(anc_cache, key);
+	exists = dbwrap_exists(obj_cache, key);
 	if (!exists) {
 		return WERR_OBJECT_NOT_FOUND;
 	}
@@ -2461,10 +2471,11 @@ allowed:
 		if (req10->extended_op != DRSUAPI_EXOP_NONE) {
 			/* Do nothing */
 		} else if (req10->replica_flags & DRSUAPI_DRS_GET_ANC) {
-			getnc_state->anc_cache = db_open_rbt(getnc_state);
-			if (getnc_state->anc_cache == NULL) {
+			getnc_state->obj_cache = db_open_rbt(getnc_state);
+			if (getnc_state->obj_cache == NULL) {
 				return WERR_NOT_ENOUGH_MEMORY;
 			}
+			getnc_state->is_get_anc = true;
 		}
 	}
 
@@ -2614,8 +2625,8 @@ allowed:
 		 * an object, we don't need to do anything more,
 		 * as we've already added the links.
 		 */
-		if (getnc_state->anc_cache != NULL) {
-			werr = dcesrv_drsuapi_anc_cache_exists(getnc_state->anc_cache,
+		if (getnc_state->obj_cache != NULL) {
+			werr = dcesrv_drsuapi_obj_cache_exists(getnc_state->obj_cache,
 							       &getnc_state->guids[i]);
 			if (W_ERROR_EQUAL(werr, WERR_OBJECT_NAME_EXISTS)) {
 				dcesrv_drsuapi_update_highwatermark(msg,
@@ -2671,14 +2682,16 @@ allowed:
 
 		new_objs = obj;
 
-		if (getnc_state->anc_cache != NULL) {
-			werr = dcesrv_drsuapi_anc_cache_add(getnc_state->anc_cache,
+		if (getnc_state->obj_cache != NULL) {
+			werr = dcesrv_drsuapi_obj_cache_add(getnc_state->obj_cache,
 							    &getnc_state->guids[i]);
 			if (!W_ERROR_IS_OK(werr)) {
 				return werr;
 			}
 
-			next_anc_guid = obj->parent_object_guid;
+			if (getnc_state->is_get_anc) {
+				next_anc_guid = obj->parent_object_guid;
+			}
 		}
 
 		while (next_anc_guid != NULL) {
@@ -2687,7 +2700,7 @@ allowed:
 			struct ldb_result *anc_res = NULL;
 			struct ldb_dn *anc_dn = NULL;
 
-			werr = dcesrv_drsuapi_anc_cache_exists(getnc_state->anc_cache,
+			werr = dcesrv_drsuapi_obj_cache_exists(getnc_state->obj_cache,
 							       next_anc_guid);
 			if (W_ERROR_EQUAL(werr, WERR_OBJECT_NAME_EXISTS)) {
 				/*
@@ -2695,9 +2708,6 @@ allowed:
 				 */
 				break;
 			}
-			if (W_ERROR_IS_OK(werr)) {
-				return WERR_INTERNAL_ERROR;
-			}
 			if (!W_ERROR_EQUAL(werr, WERR_OBJECT_NOT_FOUND)) {
 				return werr;
 			}
@@ -2774,7 +2784,7 @@ allowed:
 			 * Regardless of if we actually use it or not,
 			 * we add it to the cache so we don't look at it again
 			 */
-			werr = dcesrv_drsuapi_anc_cache_add(getnc_state->anc_cache,
+			werr = dcesrv_drsuapi_obj_cache_add(getnc_state->obj_cache,
 							    next_anc_guid);
 			if (!W_ERROR_IS_OK(werr)) {
 				return werr;
-- 
2.7.4


From 407aa9167dc025bf709540044e9ed14cd1a1fa0f Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Tue, 6 Jun 2017 15:03:33 +1200
Subject: [PATCH 16/36] getncchanges.c: Split sorting linked attributes into
 separate function

Longer-term we want to split up the links so that they're sent over
multiple GetNCChanges response messages. So it makes sense to split this
code out into its own function. In the short-term, this removes some of
the complexity from dcesrv_drsuapi_DsGetNCChanges() so that the function
is not quite so big.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/rpc_server/drsuapi/getncchanges.c | 137 ++++++++++++++++++------------
 1 file changed, 84 insertions(+), 53 deletions(-)

diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c
index 0487544..a9eafa0 100644
--- a/source4/rpc_server/drsuapi/getncchanges.c
+++ b/source4/rpc_server/drsuapi/getncchanges.c
@@ -66,7 +66,7 @@ struct drsuapi_getncchanges_state {
 
 /* We must keep the GUIDs in NDR form for sorting */
 struct la_for_sorting {
-	struct drsuapi_DsReplicaLinkedAttribute *link;
+	const struct drsuapi_DsReplicaLinkedAttribute *link;
 	uint8_t target_guid[16];
         uint8_t source_guid[16];
 };
@@ -2004,6 +2004,82 @@ static WERROR dcesrv_drsuapi_obj_cache_exists(struct db_context *obj_cache,
 	return WERR_OBJECT_NAME_EXISTS;
 }
 
+/**
+ * Copies the la_list specified into a sorted array, ready to be sent in a
+ * GetNCChanges response.
+ */
+static WERROR getncchanges_get_sorted_array(const struct drsuapi_DsReplicaLinkedAttribute *la_list,
+					    const uint32_t link_count,
+					    struct ldb_context *sam_ctx,
+					    TALLOC_CTX *mem_ctx,
+					    const struct dsdb_schema *schema,
+					    struct la_for_sorting **ret_array)
+{
+	int j;
+	struct la_for_sorting *guid_array;
+	WERROR werr = WERR_OK;
+
+	*ret_array = NULL;
+	guid_array = talloc_array(mem_ctx, struct la_for_sorting, link_count);
+	if (guid_array == NULL) {
+		DEBUG(0, ("Out of memory allocating %u linked attributes for sorting", link_count));
+		return WERR_NOT_ENOUGH_MEMORY;
+	}
+
+	for (j = 0; j < link_count; j++) {
+
+		/* we need to get the target GUIDs to compare */
+		struct dsdb_dn *dn;
+		const struct drsuapi_DsReplicaLinkedAttribute *la = &la_list[j];
+		const struct dsdb_attribute *schema_attrib;
+		const struct ldb_val *target_guid;
+		DATA_BLOB source_guid;
+		TALLOC_CTX *frame = talloc_stackframe();
+		NTSTATUS status;
+
+		schema_attrib = dsdb_attribute_by_attributeID_id(schema, la->attid);
+
+		werr = dsdb_dn_la_from_blob(sam_ctx, schema_attrib, schema, frame, la->value.blob, &dn);
+		if (!W_ERROR_IS_OK(werr)) {
+			DEBUG(0,(__location__ ": Bad la blob in sort\n"));
+			TALLOC_FREE(frame);
+			return werr;
+		}
+
+		/* Extract the target GUID in NDR form */
+		target_guid = ldb_dn_get_extended_component(dn->dn, "GUID");
+		if (target_guid == NULL
+				|| target_guid->length != sizeof(guid_array[0].target_guid)) {
+			status = NT_STATUS_OBJECT_NAME_NOT_FOUND;
+		} else {
+			/* Repack the source GUID as NDR for sorting */
+			status = GUID_to_ndr_blob(&la->identifier->guid,
+						  frame,
+						  &source_guid);
+		}
+
+		if (!NT_STATUS_IS_OK(status)
+				|| source_guid.length != sizeof(guid_array[0].source_guid)) {
+			DEBUG(0,(__location__ ": Bad la guid in sort\n"));
+			TALLOC_FREE(frame);
+			return ntstatus_to_werror(status);
+		}
+
+		guid_array[j].link = &la_list[j];
+		memcpy(guid_array[j].target_guid, target_guid->data,
+		       sizeof(guid_array[j].target_guid));
+		memcpy(guid_array[j].source_guid, source_guid.data,
+		       sizeof(guid_array[j].source_guid));
+		TALLOC_FREE(frame);
+	}
+
+	LDB_TYPESAFE_QSORT(guid_array, link_count, NULL, linked_attribute_compare);
+
+	*ret_array = guid_array;
+
+	return werr;
+}
+
 /* 
   drsuapi_DsGetNCChanges
 
@@ -2886,59 +2962,14 @@ allowed:
 	} else {
 		/* sort the whole array the first time */
 		if (getnc_state->la_sorted == NULL) {
-			int j;
-			struct la_for_sorting *guid_array = talloc_array(getnc_state, struct la_for_sorting, getnc_state->la_count);
-			if (guid_array == NULL) {
-				DEBUG(0, ("Out of memory allocating %u linked attributes for sorting", getnc_state->la_count));
-				return WERR_NOT_ENOUGH_MEMORY;
-			}
-			for (j = 0; j < getnc_state->la_count; j++) {
-				/* we need to get the target GUIDs to compare */
-				struct dsdb_dn *dn;
-				const struct drsuapi_DsReplicaLinkedAttribute *la = &getnc_state->la_list[j];
-				const struct dsdb_attribute *schema_attrib;
-				const struct ldb_val *target_guid;
-				DATA_BLOB source_guid;
-				TALLOC_CTX *frame = talloc_stackframe();
-
-				schema_attrib = dsdb_attribute_by_attributeID_id(schema, la->attid);
-
-				werr = dsdb_dn_la_from_blob(sam_ctx, schema_attrib, schema, frame, la->value.blob, &dn);
-				if (!W_ERROR_IS_OK(werr)) {
-					DEBUG(0,(__location__ ": Bad la blob in sort\n"));
-					TALLOC_FREE(frame);
-					return werr;
-				}
-
-				/* Extract the target GUID in NDR form */
-				target_guid = ldb_dn_get_extended_component(dn->dn, "GUID");
-				if (target_guid == NULL
-				    || target_guid->length != sizeof(guid_array[0].target_guid)) {
-					status = NT_STATUS_OBJECT_NAME_NOT_FOUND;
-				} else {
-					/* Repack the source GUID as NDR for sorting */
-					status = GUID_to_ndr_blob(&la->identifier->guid,
-								  frame,
-								  &source_guid);
-				}
-
-				if (!NT_STATUS_IS_OK(status)
-				    || source_guid.length != sizeof(guid_array[0].source_guid)) {
-					DEBUG(0,(__location__ ": Bad la guid in sort\n"));
-					TALLOC_FREE(frame);
-					return ntstatus_to_werror(status);
-				}
-
-				guid_array[j].link = &getnc_state->la_list[j];
-				memcpy(guid_array[j].target_guid, target_guid->data,
-				       sizeof(guid_array[j].target_guid));
-				memcpy(guid_array[j].source_guid, source_guid.data,
-				       sizeof(guid_array[j].source_guid));
-				TALLOC_FREE(frame);
+			werr = getncchanges_get_sorted_array(getnc_state->la_list,
+							     getnc_state->la_count,
+							     sam_ctx, getnc_state,
+							     schema,
+							     &getnc_state->la_sorted);
+			if (!W_ERROR_IS_OK(werr)) {
+				return werr;
 			}
-
-			LDB_TYPESAFE_QSORT(guid_array, getnc_state->la_count, NULL, linked_attribute_compare);
-			getnc_state->la_sorted = guid_array;
 		}
 
 		link_count = getnc_state->la_count - getnc_state->la_idx;
-- 
2.7.4


From 872630f5b1ecc5d76935746ce26ba043e640750c Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Fri, 2 Jun 2017 10:47:46 +1200
Subject: [PATCH 17/36] getncchanges.c: Split GET_ANC block out into its own
 function

When we add GET_TGT support, it's going to need to reuse all this code
(i.e. to add any ancestors of the link target). This also trims down
the rather large dcesrv_drsuapi_DsGetNCChanges() function a bit.

Note also fixed a compiler warning in the WERR_DS_DRA_INCONSISTENT_DIT
error block which may have caused issues previously (statement was
terminated by a ',' rather than a ';').

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/rpc_server/drsuapi/getncchanges.c | 286 +++++++++++++++++-------------
 1 file changed, 166 insertions(+), 120 deletions(-)

diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c
index a9eafa0..14fd733 100644
--- a/source4/rpc_server/drsuapi/getncchanges.c
+++ b/source4/rpc_server/drsuapi/getncchanges.c
@@ -2080,6 +2080,161 @@ static WERROR getncchanges_get_sorted_array(const struct drsuapi_DsReplicaLinked
 	return werr;
 }
 
+
+/**
+ * Adds any ancestor/parent objects of the child_obj specified.
+ * This is needed when the GET_ANC flag is specified in the request.
+ * @param new_objs if parents are added, this gets updated to point to a chain
+ * of parent objects (with the parents first and the child last)
+ */
+static WERROR getncchanges_add_ancestors(struct drsuapi_DsReplicaObjectListItemEx *child_obj,
+					 struct ldb_dn *child_dn,
+					 TALLOC_CTX *mem_ctx,
+					 struct ldb_context *sam_ctx,
+					 struct drsuapi_getncchanges_state *getnc_state,
+					 struct dsdb_schema *schema,
+					 DATA_BLOB *session_key,
+					 struct drsuapi_DsGetNCChangesRequest10 *req10,
+					 uint32_t *local_pas,
+					 struct ldb_dn *machine_dn,
+					 struct drsuapi_DsReplicaObjectListItemEx **new_objs)
+{
+	int ret;
+	const struct GUID *next_anc_guid = NULL;
+	WERROR werr = WERR_OK;
+	static const char * const msg_attrs[] = {
+					    "*",
+					    "nTSecurityDescriptor",
+					    "parentGUID",
+					    "replPropertyMetaData",
+					    DSDB_SECRET_ATTRIBUTES,
+					    NULL };
+
+	next_anc_guid = child_obj->parent_object_guid;
+
+	while (next_anc_guid != NULL) {
+		struct drsuapi_DsReplicaObjectListItemEx *anc_obj = NULL;
+		struct ldb_message *anc_msg = NULL;
+		struct ldb_result *anc_res = NULL;
+		struct ldb_dn *anc_dn = NULL;
+
+		/*
+		 * Don't send an object twice. (If we've sent the object, then
+		 * we've also sent all its parents as well)
+		 */
+		werr = dcesrv_drsuapi_obj_cache_exists(getnc_state->obj_cache,
+						       next_anc_guid);
+		if (W_ERROR_EQUAL(werr, WERR_OBJECT_NAME_EXISTS)) {
+			return WERR_OK;
+		}
+		if (!W_ERROR_EQUAL(werr, WERR_OBJECT_NOT_FOUND)) {
+			return werr;
+		}
+
+		anc_obj = talloc_zero(mem_ctx,
+				      struct drsuapi_DsReplicaObjectListItemEx);
+		if (anc_obj == NULL) {
+			return WERR_NOT_ENOUGH_MEMORY;
+		}
+
+		anc_dn = ldb_dn_new_fmt(anc_obj, sam_ctx, "<GUID=%s>",
+					GUID_string(anc_obj, next_anc_guid));
+		if (anc_dn == NULL) {
+			return WERR_NOT_ENOUGH_MEMORY;
+		}
+
+		ret = drsuapi_search_with_extended_dn(sam_ctx, anc_obj,
+						      &anc_res, anc_dn,
+						      LDB_SCOPE_BASE,
+						      msg_attrs, NULL);
+		if (ret != LDB_SUCCESS) {
+			const char *anc_str = NULL;
+			const char *obj_str = NULL;
+
+			anc_str = ldb_dn_get_extended_linearized(anc_obj,
+								 anc_dn,
+								 1);
+			obj_str = ldb_dn_get_extended_linearized(anc_obj,
+								 child_dn,
+								 1);
+
+			DBG_ERR("getncchanges: failed to fetch ANC "
+				"DN %s for DN %s - %s\n",
+				anc_str, obj_str, ldb_errstring(sam_ctx));
+			return WERR_DS_DRA_INCONSISTENT_DIT;
+		}
+
+		anc_msg = anc_res->msgs[0];
+
+		werr = get_nc_changes_build_object(anc_obj, anc_msg,
+						   sam_ctx,
+						   getnc_state->ncRoot_dn,
+						   getnc_state->is_schema_nc,
+						   schema, session_key,
+						   getnc_state->min_usn,
+						   req10->replica_flags,
+						   req10->partial_attribute_set,
+						   req10->uptodateness_vector,
+						   req10->extended_op,
+						   false, /* force_object_return */
+						   local_pas,
+						   machine_dn,
+						   next_anc_guid);
+		if (!W_ERROR_IS_OK(werr)) {
+			return werr;
+		}
+
+		werr = get_nc_changes_add_links(sam_ctx, getnc_state,
+						getnc_state->ncRoot_dn,
+						getnc_state->is_schema_nc,
+						schema, getnc_state->min_usn,
+						req10->replica_flags,
+						anc_msg,
+						&getnc_state->la_list,
+						&getnc_state->la_count,
+						req10->uptodateness_vector);
+		if (!W_ERROR_IS_OK(werr)) {
+			return werr;
+		}
+
+		/*
+		 * Regardless of whether we actually use it or not,
+		 * we add it to the cache so we don't look at it again
+		 */
+		werr = dcesrv_drsuapi_obj_cache_add(getnc_state->obj_cache,
+						    next_anc_guid);
+		if (!W_ERROR_IS_OK(werr)) {
+			return werr;
+		}
+
+		/*
+		 * Any ancestors which are below the highwatermark
+		 * or uptodateness_vector shouldn't be added,
+		 * but we still look further up the
+		 * tree for ones which have been changed recently.
+		 */
+		if (anc_obj->meta_data_ctr != NULL) {
+
+			/*
+			 * prepend the parent to the list so that the client-side
+			 * adds the parent object before it adds the children
+			 */
+			anc_obj->next_object = *new_objs;
+			*new_objs = anc_obj;
+		}
+
+		anc_msg = NULL;
+		TALLOC_FREE(anc_res);
+		TALLOC_FREE(anc_dn);
+
+		/*
+		 * We may need to resolve more parents...
+		 */
+		next_anc_guid = anc_obj->parent_object_guid;
+	}
+	return werr;
+}
+
 /* 
   drsuapi_DsGetNCChanges
 
@@ -2655,7 +2810,6 @@ allowed:
 					    NULL };
 		struct ldb_result *msg_res;
 		struct ldb_dn *msg_dn;
-		const struct GUID *next_anc_guid = NULL;
 
 		obj = talloc_zero(mem_ctx, struct drsuapi_DsReplicaObjectListItemEx);
 		W_ERROR_HAVE_NO_MEMORY(obj);
@@ -2764,130 +2918,22 @@ allowed:
 			if (!W_ERROR_IS_OK(werr)) {
 				return werr;
 			}
-
-			if (getnc_state->is_get_anc) {
-				next_anc_guid = obj->parent_object_guid;
-			}
 		}
 
-		while (next_anc_guid != NULL) {
-			struct drsuapi_DsReplicaObjectListItemEx *anc_obj = NULL;
-			struct ldb_message *anc_msg = NULL;
-			struct ldb_result *anc_res = NULL;
-			struct ldb_dn *anc_dn = NULL;
-
-			werr = dcesrv_drsuapi_obj_cache_exists(getnc_state->obj_cache,
-							       next_anc_guid);
-			if (W_ERROR_EQUAL(werr, WERR_OBJECT_NAME_EXISTS)) {
-				/*
-				 * We don't need to send it twice.
-				 */
-				break;
-			}
-			if (!W_ERROR_EQUAL(werr, WERR_OBJECT_NOT_FOUND)) {
-				return werr;
-			}
-			werr = WERR_OK;
-
-			anc_obj = talloc_zero(mem_ctx,
-					struct drsuapi_DsReplicaObjectListItemEx);
-			if (anc_obj == NULL) {
-				return WERR_NOT_ENOUGH_MEMORY;
-			}
-
-			anc_dn = ldb_dn_new_fmt(anc_obj, sam_ctx, "<GUID=%s>",
-					GUID_string(anc_obj, next_anc_guid));
-			if (anc_dn == NULL) {
-				return WERR_NOT_ENOUGH_MEMORY;
-			}
-
-			ret = drsuapi_search_with_extended_dn(sam_ctx, anc_obj,
-							      &anc_res, anc_dn,
-							      LDB_SCOPE_BASE,
-							      msg_attrs, NULL);
-			if (ret != LDB_SUCCESS) {
-				const char *anc_str = NULL;
-				const char *obj_str = NULL;
-
-				anc_str = ldb_dn_get_extended_linearized(anc_obj,
-									 anc_dn,
-									 1);
-				obj_str = ldb_dn_get_extended_linearized(anc_obj,
-									 msg->dn,
-									 1),
-
-				DBG_ERR("getncchanges: failed to fetch ANC "
-					"DN %s for DN %s - %s\n",
-					anc_str, obj_str,
-					ldb_errstring(sam_ctx));
-				return WERR_DS_DRA_INCONSISTENT_DIT;
-			}
-
-			anc_msg = anc_res->msgs[0];
-
-			werr = get_nc_changes_build_object(anc_obj, anc_msg,
-							   sam_ctx,
-							   getnc_state->ncRoot_dn,
-							   getnc_state->is_schema_nc,
-							   schema, &session_key,
-							   getnc_state->min_usn,
-							   req10->replica_flags,
-							   req10->partial_attribute_set,
-							   req10->uptodateness_vector,
-							   req10->extended_op,
-							   false, /* force_object_return */
-							   local_pas,
-							   machine_dn,
-							   next_anc_guid);
-			if (!W_ERROR_IS_OK(werr)) {
-				return werr;
-			}
-
-			werr = get_nc_changes_add_links(sam_ctx, getnc_state,
-							getnc_state->ncRoot_dn,
-							getnc_state->is_schema_nc,
-							schema, getnc_state->min_usn,
-							req10->replica_flags,
-							anc_msg,
-							&getnc_state->la_list,
-							&getnc_state->la_count,
-							req10->uptodateness_vector);
-			if (!W_ERROR_IS_OK(werr)) {
-				return werr;
-			}
-
-			/*
-			 * Regardless of if we actually use it or not,
-			 * we add it to the cache so we don't look at it again
-			 */
-			werr = dcesrv_drsuapi_obj_cache_add(getnc_state->obj_cache,
-							    next_anc_guid);
+		/*
+		 * For GET_ANC, prepend any parents that the client needs
+		 * to know about before it can add this object
+		 */
+		if (getnc_state->is_get_anc) {
+			werr = getncchanges_add_ancestors(obj, msg->dn, mem_ctx,
+							  sam_ctx, getnc_state,
+							  schema, &session_key,
+							  req10, local_pas,
+							  machine_dn,
+							  &new_objs);
 			if (!W_ERROR_IS_OK(werr)) {
 				return werr;
 			}
-
-			/*
-			 * Any ancestors which are below the highwatermark
-			 * or uptodateness_vector shouldn't be added,
-			 * but we still look further up the
-			 * tree for ones which have been changed recently.
-			 */
-			if (anc_obj->meta_data_ctr != NULL) {
-				/*
-				 * prepend it to the list
-				 */
-				anc_obj->next_object = new_objs;
-				new_objs = anc_obj;
-			}
-
-			anc_msg = NULL;
-			TALLOC_FREE(anc_res);
-			TALLOC_FREE(anc_dn);
-
-			/*
-			 * We may need to resolve more...
-			 */
-			next_anc_guid = anc_obj->parent_object_guid;
 		}
 
 		*currentObject = new_objs;
-- 
2.7.4


From f81b2e6b93fdb53cb6f36eb744b06253e67b4b2b Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Tue, 6 Jun 2017 17:46:13 +1200
Subject: [PATCH 18/36] getncchanges.c: Add ancestor links when the object
 normally gets sent

Currently we add links each time we send an object, but we don't
actually send these until the end of the replication cycle. There
doesn't seem to be much point in this. Rework the code so that the links
get added for an object when that object would normally be sent (the
same place we update the HWM). This should make it easier to send the
links in roughly the same chunk as its object.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/rpc_server/drsuapi/getncchanges.c | 69 +++++++++++++------------------
 1 file changed, 29 insertions(+), 40 deletions(-)

diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c
index 14fd733..0d68d39 100644
--- a/source4/rpc_server/drsuapi/getncchanges.c
+++ b/source4/rpc_server/drsuapi/getncchanges.c
@@ -2184,19 +2184,6 @@ static WERROR getncchanges_add_ancestors(struct drsuapi_DsReplicaObjectListItemE
 			return werr;
 		}
 
-		werr = get_nc_changes_add_links(sam_ctx, getnc_state,
-						getnc_state->ncRoot_dn,
-						getnc_state->is_schema_nc,
-						schema, getnc_state->min_usn,
-						req10->replica_flags,
-						anc_msg,
-						&getnc_state->la_list,
-						&getnc_state->la_count,
-						req10->uptodateness_vector);
-		if (!W_ERROR_IS_OK(werr)) {
-			return werr;
-		}
-
 		/*
 		 * Regardless of whether we actually use it or not,
 		 * we add it to the cache so we don't look at it again
@@ -2810,6 +2797,7 @@ allowed:
 					    NULL };
 		struct ldb_result *msg_res;
 		struct ldb_dn *msg_dn;
+		bool obj_already_sent = false;
 
 		obj = talloc_zero(mem_ctx, struct drsuapi_DsReplicaObjectListItemEx);
 		W_ERROR_HAVE_NO_MEMORY(obj);
@@ -2851,40 +2839,41 @@ allowed:
 		msg = msg_res->msgs[0];
 
 		/*
-		 * If it has already been added as an ancestor of
-		 * an object, we don't need to do anything more,
-		 * as we've already added the links.
+		 * Check if we've already sent the object as an ancestor of
+		 * another object. If so, we don't need to send it again
 		 */
 		if (getnc_state->obj_cache != NULL) {
 			werr = dcesrv_drsuapi_obj_cache_exists(getnc_state->obj_cache,
 							       &getnc_state->guids[i]);
 			if (W_ERROR_EQUAL(werr, WERR_OBJECT_NAME_EXISTS)) {
-				dcesrv_drsuapi_update_highwatermark(msg,
-						getnc_state->max_usn,
-						&r->out.ctr->ctr6.new_highwatermark);
-				/* no attributes to send */
-				talloc_free(obj);
-				continue;
+				obj_already_sent = true;
 			}
 		}
 
-		max_wait_reached = (time(NULL) - start > max_wait);
-
-		werr = get_nc_changes_build_object(obj, msg,
-						   sam_ctx, getnc_state->ncRoot_dn,
-						   getnc_state->is_schema_nc,
-						   schema, &session_key, getnc_state->min_usn,
-						   req10->replica_flags,
-						   req10->partial_attribute_set,
-						   req10->uptodateness_vector,
-						   req10->extended_op,
-						   max_wait_reached,
-						   local_pas, machine_dn,
-						   &getnc_state->guids[i]);
-		if (!W_ERROR_IS_OK(werr)) {
-			return werr;
+		if (!obj_already_sent) {
+			max_wait_reached = (time(NULL) - start > max_wait);
+
+			werr = get_nc_changes_build_object(obj, msg,
+							   sam_ctx, getnc_state->ncRoot_dn,
+							   getnc_state->is_schema_nc,
+							   schema, &session_key, getnc_state->min_usn,
+							   req10->replica_flags,
+							   req10->partial_attribute_set,
+							   req10->uptodateness_vector,
+							   req10->extended_op,
+							   max_wait_reached,
+							   local_pas, machine_dn,
+							   &getnc_state->guids[i]);
+			if (!W_ERROR_IS_OK(werr)) {
+				return werr;
+			}
 		}
 
+		/*
+		 * We've reached the USN where this object naturally occurs.
+		 * Regardless of whether we've already sent the object (as an
+		 * ancestor), we add its links and update the HWM at this point
+		 */
 		werr = get_nc_changes_add_links(sam_ctx, getnc_state,
 						getnc_state->ncRoot_dn,
 						getnc_state->is_schema_nc,
@@ -2902,11 +2891,11 @@ allowed:
 					getnc_state->max_usn,
 					&r->out.ctr->ctr6.new_highwatermark);
 
-		if (obj->meta_data_ctr == NULL) {
+		if (obj_already_sent || obj->meta_data_ctr == NULL) {
 			DEBUG(8,(__location__ ": getncchanges skipping send of object %s\n",
 				 ldb_dn_get_linearized(msg->dn)));
-			/* no attributes to send */
-			talloc_free(obj);
+			/* nothing to send */
+			TALLOC_FREE(obj);
 			continue;
 		}
 
-- 
2.7.4


From aba6fd7aaa40467e056115e3630c6f3a89de99b7 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Mon, 10 Jul 2017 12:09:28 +1200
Subject: [PATCH 19/36] getncchanges.c: Refactor how objects get added to the
 response

Adding GET_TGT support is going to make things more complicated, and I
think we are going to struggle to do this without refactoring things a
bit.

This patch:
- adds a helper struct to store state related to a single GetNCChanges
  chunk. I plan to add to this with things like max_links, max_objects,
  etc, which will cutdown on the number of variables/parameters we pass
  around.
- I found the double-pointer logic where we add objects to the response
  confusing - hopefully this refactor simplifies things slightly, and it
  allows us to reuse the code for the GET_TGT case.
- Replaces some hard-coded 16s with a GUID_SIZE define
- Also removes a really old TODO that was added in 2009 (before Samba
  supported linked_attributes in getNCChanges())

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/rpc_server/drsuapi/getncchanges.c | 77 ++++++++++++++++++++++++-------
 1 file changed, 61 insertions(+), 16 deletions(-)

diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c
index 0d68d39..a8262c4 100644
--- a/source4/rpc_server/drsuapi/getncchanges.c
+++ b/source4/rpc_server/drsuapi/getncchanges.c
@@ -42,7 +42,12 @@
 #include "lib/dbwrap/dbwrap_rbt.h"
 #include "librpc/gen_ndr/ndr_misc.h"
 
-/* state of a partially completed getncchanges call */
+#define DRS_GUID_SIZE       16
+
+/*
+ * state of a partially-completed replication cycle. This state persists
+ * over multiple calls to dcesrv_drsuapi_DsGetNCChanges()
+ */
 struct drsuapi_getncchanges_state {
 	struct db_context *obj_cache;
 	struct GUID *guids;
@@ -67,8 +72,19 @@ struct drsuapi_getncchanges_state {
 /* We must keep the GUIDs in NDR form for sorting */
 struct la_for_sorting {
 	const struct drsuapi_DsReplicaLinkedAttribute *link;
-	uint8_t target_guid[16];
-        uint8_t source_guid[16];
+	uint8_t target_guid[DRS_GUID_SIZE];
+	uint8_t source_guid[DRS_GUID_SIZE];
+};
+
+/*
+ * stores the state for a chunk of replication data. This state information
+ * only exists for a single call to dcesrv_drsuapi_DsGetNCChanges()
+ */
+struct getncchanges_repl_chunk {
+	struct drsuapi_DsGetNCChangesCtr6 *ctr6;
+
+	/* the last object written to the response */
+	struct drsuapi_DsReplicaObjectListItemEx *last_object;
 };
 
 static int drsuapi_DsReplicaHighWaterMark_cmp(const struct drsuapi_DsReplicaHighWaterMark *h1,
@@ -1942,7 +1958,7 @@ static WERROR dcesrv_drsuapi_obj_cache_add(struct db_context *obj_cache,
 					   const struct GUID *guid)
 {
 	enum ndr_err_code ndr_err;
-	uint8_t guid_buf[16] = { 0, };
+	uint8_t guid_buf[DRS_GUID_SIZE] = { 0, };
 	DATA_BLOB b = {
 		.data = guid_buf,
 		.length = sizeof(guid_buf),
@@ -1979,7 +1995,7 @@ static WERROR dcesrv_drsuapi_obj_cache_exists(struct db_context *obj_cache,
 					      const struct GUID *guid)
 {
 	enum ndr_err_code ndr_err;
-	uint8_t guid_buf[16] = { 0, };
+	uint8_t guid_buf[DRS_GUID_SIZE] = { 0, };
 	DATA_BLOB b = {
 		.data = guid_buf,
 		.length = sizeof(guid_buf),
@@ -2222,6 +2238,37 @@ static WERROR getncchanges_add_ancestors(struct drsuapi_DsReplicaObjectListItemE
 	return werr;
 }
 
+/**
+ * Adds a list of new objects into the getNCChanges response message
+ */
+static void getncchanges_add_objs_to_resp(struct getncchanges_repl_chunk *repl_chunk,
+					  struct drsuapi_DsReplicaObjectListItemEx *obj_list)
+{
+	struct drsuapi_DsReplicaObjectListItemEx *obj;
+
+	/*
+	 * We track the last object added to the response message, so just add
+	 * the new object-list onto the end
+	 */
+	if (repl_chunk->last_object == NULL) {
+		repl_chunk->ctr6->first_object = obj_list;
+	} else {
+		repl_chunk->last_object->next_object = obj_list;
+	}
+
+	for (obj = obj_list; obj != NULL; obj = obj->next_object) {
+		repl_chunk->ctr6->object_count += 1;
+
+		/*
+		 * Remember the last object in the response - we'll use this to
+		 * link the next object(s) processed onto the existing list
+		 */
+		if (obj->next_object == NULL) {
+			repl_chunk->last_object = obj;
+		}
+	}
+}
+
 /* 
   drsuapi_DsGetNCChanges
 
@@ -2235,7 +2282,7 @@ WERROR dcesrv_drsuapi_DsGetNCChanges(struct dcesrv_call_state *dce_call, TALLOC_
 	uint32_t i, k;
 	struct dsdb_schema *schema;
 	struct drsuapi_DsReplicaOIDMapping_Ctr *ctr;
-	struct drsuapi_DsReplicaObjectListItemEx **currentObject;
+	struct getncchanges_repl_chunk repl_chunk = { 0 };
 	NTSTATUS status;
 	DATA_BLOB session_key;
 	WERROR werr;
@@ -2276,7 +2323,7 @@ WERROR dcesrv_drsuapi_DsGetNCChanges(struct dcesrv_call_state *dce_call, TALLOC_
 	invocation_id = *(samdb_ntds_invocation_id(sam_ctx));
 
 	*r->out.level_out = 6;
-	/* TODO: linked attributes*/
+
 	r->out.ctr->ctr6.linked_attributes_count = 0;
 	r->out.ctr->ctr6.linked_attributes = discard_const_p(struct drsuapi_DsReplicaLinkedAttribute, &no_linked_attr);
 
@@ -2729,7 +2776,8 @@ allowed:
 	r->out.ctr->ctr6.old_highwatermark = req10->highwatermark;
 	r->out.ctr->ctr6.new_highwatermark = req10->highwatermark;
 
-	currentObject = &r->out.ctr->ctr6.first_object;
+	repl_chunk.ctr6 = &r->out.ctr->ctr6;
+	repl_chunk.last_object = NULL;
 
 	max_objects = lpcfg_parm_int(dce_call->conn->dce_ctx->lp_ctx, NULL, "drs", "max object sync", 1000);
 	/*
@@ -2925,14 +2973,11 @@ allowed:
 			}
 		}
 
-		*currentObject = new_objs;
-		while (new_objs != NULL) {
-			r->out.ctr->ctr6.object_count += 1;
-			if (new_objs->next_object == NULL) {
-				currentObject = &new_objs->next_object;
-			}
-			new_objs = new_objs->next_object;
-		}
+		/*
+		 * Add the object (and any parents it might have) into the
+		 * response message
+		 */
+		getncchanges_add_objs_to_resp(&repl_chunk, new_objs);
 
 		DEBUG(8,(__location__ ": replicating object %s\n", ldb_dn_get_linearized(msg->dn)));
 
-- 
2.7.4


From 50dcfb5ea62c31328954af867ab57faf5a075264 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Mon, 10 Jul 2017 14:16:13 +1200
Subject: [PATCH 20/36] getncchanges.c: Split out code to get an object for a
 response

GET_TGT is going to use the exact same code, so split this out into a
separate function, rather than duplicating it.

The GET_ANC case also uses almost identical code, but it differs in a
couple of minor aspects. I've left this as is for now, as I'm not sure
if this is by accident or by design.

Other minor changes:
- Reduced the parameters to get_nc_changes_build_object(). Sixteen
  parameters seemed excessive, so pass it the structs containing the
  necessary fields instead.
- Also noticed that the ncRoot_dn parameter was unused in a couple of
  functions.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/rpc_server/drsuapi/getncchanges.c | 196 ++++++++++++++++++------------
 1 file changed, 117 insertions(+), 79 deletions(-)

diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c
index a8262c4..1aa4a51 100644
--- a/source4/rpc_server/drsuapi/getncchanges.c
+++ b/source4/rpc_server/drsuapi/getncchanges.c
@@ -454,15 +454,10 @@ static WERROR get_nc_changes_filter_attrs(struct drsuapi_DsReplicaObjectListItem
 static WERROR get_nc_changes_build_object(struct drsuapi_DsReplicaObjectListItemEx *obj,
 					  const struct ldb_message *msg,
 					  struct ldb_context *sam_ctx,
-					  struct ldb_dn *ncRoot_dn,
-					  bool   is_schema_nc,
+					  struct drsuapi_getncchanges_state *getnc_state,
 					  struct dsdb_schema *schema,
 					  DATA_BLOB *session_key,
-					  uint64_t highest_usn,
-					  uint32_t replica_flags,
-					  struct drsuapi_DsPartialAttributeSet *partial_attribute_set,
-					  struct drsuapi_DsReplicaCursorCtrEx *uptodateness_vector,
-					  enum drsuapi_DsExtendedOperation extended_op,
+					  struct drsuapi_DsGetNCChangesRequest10 *req10,
 					  bool force_object_return,
 					  uint32_t *local_pas,
 					  struct ldb_dn *machine_dn,
@@ -483,6 +478,14 @@ static WERROR get_nc_changes_build_object(struct drsuapi_DsReplicaObjectListItem
 	struct ldb_result *res = NULL;
 	WERROR werr;
 	int ret;
+	uint32_t replica_flags = req10->replica_flags;
+	struct drsuapi_DsPartialAttributeSet *partial_attribute_set =
+			req10->partial_attribute_set;
+	struct drsuapi_DsReplicaCursorCtrEx *uptodateness_vector =
+			req10->uptodateness_vector;
+	enum drsuapi_DsExtendedOperation extended_op = req10->extended_op;
+	bool is_schema_nc = getnc_state->is_schema_nc;
+	uint64_t highest_usn = getnc_state->min_usn;
 
 	/* make dsdb sytanx context for conversions */
 	dsdb_syntax_ctx_init(&syntax_ctx, sam_ctx, schema);
@@ -856,7 +859,6 @@ static WERROR get_nc_changes_add_la(TALLOC_CTX *mem_ctx,
  */
 static WERROR get_nc_changes_add_links(struct ldb_context *sam_ctx,
 				       TALLOC_CTX *mem_ctx,
-				       struct ldb_dn *ncRoot_dn,
 				       bool is_schema_nc,
 				       struct dsdb_schema *schema,
 				       uint64_t highest_usn,
@@ -2184,14 +2186,9 @@ static WERROR getncchanges_add_ancestors(struct drsuapi_DsReplicaObjectListItemE
 
 		werr = get_nc_changes_build_object(anc_obj, anc_msg,
 						   sam_ctx,
-						   getnc_state->ncRoot_dn,
-						   getnc_state->is_schema_nc,
+						   getnc_state,
 						   schema, session_key,
-						   getnc_state->min_usn,
-						   req10->replica_flags,
-						   req10->partial_attribute_set,
-						   req10->uptodateness_vector,
-						   req10->extended_op,
+						   req10,
 						   false, /* force_object_return */
 						   local_pas,
 						   machine_dn,
@@ -2269,6 +2266,78 @@ static void getncchanges_add_objs_to_resp(struct getncchanges_repl_chunk *repl_c
 	}
 }
 
+/**
+ * Gets the object to send, packed into an RPC struct ready to send. This also
+ * adds the object to the object cache, and adds any ancestors (if needed).
+ * @param msg - DB search result for the object to add
+ * @param guid - GUID of the object to add
+ * @param ret_obj_list - returns the object ready to be sent (in a list, along
+ * with any ancestors that might be needed). NULL if nothing to send.
+ */
+static WERROR getncchanges_get_obj_to_send(const struct ldb_message *msg,
+					   TALLOC_CTX *mem_ctx,
+					   struct ldb_context *sam_ctx,
+					   struct drsuapi_getncchanges_state *getnc_state,
+					   struct dsdb_schema *schema,
+					   DATA_BLOB *session_key,
+					   struct drsuapi_DsGetNCChangesRequest10 *req10,
+					   bool force_object_return,
+					   uint32_t *local_pas,
+					   struct ldb_dn *machine_dn,
+					   const struct GUID *guid,
+					   struct drsuapi_DsReplicaObjectListItemEx **ret_obj_list)
+{
+	struct drsuapi_DsReplicaObjectListItemEx *obj;
+	WERROR werr;
+
+	obj = talloc_zero(mem_ctx, struct drsuapi_DsReplicaObjectListItemEx);
+	W_ERROR_HAVE_NO_MEMORY(obj);
+
+	werr = get_nc_changes_build_object(obj, msg, sam_ctx, getnc_state,
+					   schema, session_key, req10,
+					   force_object_return,
+					   local_pas, machine_dn, guid);
+	if (!W_ERROR_IS_OK(werr)) {
+		return werr;
+	}
+
+	/*
+	 * The object may get filtered out by the UTDV's USN and not actually
+	 * sent, in which case there's nothing more to do here
+	 */
+	if (obj->meta_data_ctr == NULL) {
+		TALLOC_FREE(obj);
+		*ret_obj_list = NULL;
+		return WERR_OK;
+	}
+
+	if (getnc_state->obj_cache != NULL) {
+		werr = dcesrv_drsuapi_obj_cache_add(getnc_state->obj_cache,
+						    guid);
+		if (!W_ERROR_IS_OK(werr)) {
+			return werr;
+		}
+	}
+
+	*ret_obj_list = obj;
+
+	/*
+	 * If required, also add any ancestors that the client may need to know
+	 * about before it can resolve this object. These get prepended to the
+	 * ret_obj_list so the client adds them first.
+	 */
+	if (getnc_state->is_get_anc) {
+		werr = getncchanges_add_ancestors(obj, msg->dn, mem_ctx,
+						  sam_ctx, getnc_state,
+						  schema, session_key,
+						  req10, local_pas,
+						  machine_dn, ret_obj_list);
+	}
+
+	return werr;
+}
+
+
 /* 
   drsuapi_DsGetNCChanges
 
@@ -2834,7 +2903,6 @@ allowed:
 		     && !max_wait_reached;
 	    i++) {
 		struct drsuapi_DsReplicaObjectListItemEx *new_objs = NULL;
-		struct drsuapi_DsReplicaObjectListItemEx *obj;
 		struct ldb_message *msg;
 		static const char * const msg_attrs[] = {
 					    "*",
@@ -2846,14 +2914,14 @@ allowed:
 		struct ldb_result *msg_res;
 		struct ldb_dn *msg_dn;
 		bool obj_already_sent = false;
+		TALLOC_CTX *tmp_ctx;
 
-		obj = talloc_zero(mem_ctx, struct drsuapi_DsReplicaObjectListItemEx);
-		W_ERROR_HAVE_NO_MEMORY(obj);
+		tmp_ctx = talloc_new(mem_ctx);
 
-		msg_dn = ldb_dn_new_fmt(obj, sam_ctx, "<GUID=%s>", GUID_string(obj, &getnc_state->guids[i]));
+		msg_dn = ldb_dn_new_fmt(tmp_ctx, sam_ctx, "<GUID=%s>",
+					GUID_string(tmp_ctx, &getnc_state->guids[i]));
 		W_ERROR_HAVE_NO_MEMORY(msg_dn);
 
-
 		/*
 		 * by re-searching here we avoid having a lot of full
 		 * records in memory between calls to getncchanges.
@@ -2862,15 +2930,16 @@ allowed:
 		 * (tombstone expunge) between the first and second
 		 * check.
 		 */
-		ret = drsuapi_search_with_extended_dn(sam_ctx, obj, &msg_res,
+		ret = drsuapi_search_with_extended_dn(sam_ctx, tmp_ctx, &msg_res,
 						      msg_dn,
 						      LDB_SCOPE_BASE, msg_attrs, NULL);
 		if (ret != LDB_SUCCESS) {
 			if (ret != LDB_ERR_NO_SUCH_OBJECT) {
 				DEBUG(1,("getncchanges: failed to fetch DN %s - %s\n",
-					 ldb_dn_get_extended_linearized(obj, msg_dn, 1), ldb_errstring(sam_ctx)));
+					 ldb_dn_get_extended_linearized(tmp_ctx, msg_dn, 1),
+					 ldb_errstring(sam_ctx)));
 			}
-			talloc_free(obj);
+			TALLOC_FREE(tmp_ctx);
 			continue;
 		}
 
@@ -2878,9 +2947,9 @@ allowed:
 			DEBUG(1,("getncchanges: got LDB_SUCCESS but failed"
 				 "to get any results in fetch of DN "
 				 "%s (race with tombstone expunge?)\n",
-				 ldb_dn_get_extended_linearized(obj,
+				 ldb_dn_get_extended_linearized(tmp_ctx,
 								msg_dn, 1)));
-			talloc_free(obj);
+			TALLOC_FREE(tmp_ctx);
 			continue;
 		}
 
@@ -2901,17 +2970,17 @@ allowed:
 		if (!obj_already_sent) {
 			max_wait_reached = (time(NULL) - start > max_wait);
 
-			werr = get_nc_changes_build_object(obj, msg,
-							   sam_ctx, getnc_state->ncRoot_dn,
-							   getnc_state->is_schema_nc,
-							   schema, &session_key, getnc_state->min_usn,
-							   req10->replica_flags,
-							   req10->partial_attribute_set,
-							   req10->uptodateness_vector,
-							   req10->extended_op,
-							   max_wait_reached,
-							   local_pas, machine_dn,
-							   &getnc_state->guids[i]);
+			/*
+			 * Construct an object, ready to send (this will include
+			 * the object's ancestors as well, if needed)
+			 */
+			werr = getncchanges_get_obj_to_send(msg, mem_ctx, sam_ctx,
+							    getnc_state, schema,
+							    &session_key, req10,
+							    max_wait_reached,
+							    local_pas, machine_dn,
+							    &getnc_state->guids[i],
+							    &new_objs);
 			if (!W_ERROR_IS_OK(werr)) {
 				return werr;
 			}
@@ -2923,7 +2992,6 @@ allowed:
 		 * ancestor), we add its links and update the HWM at this point
 		 */
 		werr = get_nc_changes_add_links(sam_ctx, getnc_state,
-						getnc_state->ncRoot_dn,
 						getnc_state->is_schema_nc,
 						schema, getnc_state->min_usn,
 						req10->replica_flags,
@@ -2939,53 +3007,23 @@ allowed:
 					getnc_state->max_usn,
 					&r->out.ctr->ctr6.new_highwatermark);
 
-		if (obj_already_sent || obj->meta_data_ctr == NULL) {
-			DEBUG(8,(__location__ ": getncchanges skipping send of object %s\n",
-				 ldb_dn_get_linearized(msg->dn)));
-			/* nothing to send */
-			TALLOC_FREE(obj);
-			continue;
-		}
-
-		new_objs = obj;
+		if (new_objs != NULL) {
 
-		if (getnc_state->obj_cache != NULL) {
-			werr = dcesrv_drsuapi_obj_cache_add(getnc_state->obj_cache,
-							    &getnc_state->guids[i]);
-			if (!W_ERROR_IS_OK(werr)) {
-				return werr;
-			}
-		}
+			/*
+			 * Add the object (and, if GET_ANC, any parents it may
+			 * have) into the current chunk of replication data
+			 */
+			getncchanges_add_objs_to_resp(&repl_chunk, new_objs);
 
-		/*
-		 * For GET_ANC, prepend any parents that the client needs
-		 * to know about before it can add this object
-		 */
-		if (getnc_state->is_get_anc) {
-			werr = getncchanges_add_ancestors(obj, msg->dn, mem_ctx,
-							  sam_ctx, getnc_state,
-							  schema, &session_key,
-							  req10, local_pas,
-							  machine_dn,
-							  &new_objs);
-			if (!W_ERROR_IS_OK(werr)) {
-				return werr;
-			}
+			talloc_free(getnc_state->last_dn);
+			getnc_state->last_dn = talloc_move(getnc_state, &msg->dn);
 		}
 
-		/*
-		 * Add the object (and any parents it might have) into the
-		 * response message
-		 */
-		getncchanges_add_objs_to_resp(&repl_chunk, new_objs);
-
-		DEBUG(8,(__location__ ": replicating object %s\n", ldb_dn_get_linearized(msg->dn)));
-
-		talloc_free(getnc_state->last_dn);
-		getnc_state->last_dn = talloc_move(getnc_state, &msg->dn);
+		DEBUG(8,(__location__ ": %s object %s\n",
+			 new_objs ? "replicating" : "skipping send of",
+			 ldb_dn_get_linearized(msg->dn)));
 
-		talloc_free(msg_res);
-		talloc_free(msg_dn);
+		TALLOC_FREE(tmp_ctx);
 	}
 
 	getnc_state->num_processed = i;
-- 
2.7.4


From 6c25aca8b67a5c5f8d06f9a2614c1f736f4aebf5 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Wed, 7 Jun 2017 10:46:47 +1200
Subject: [PATCH 21/36] getncchanges.c: Send linked attributes in each chunk

Instead of sending all the linked attributes at the end, add a
configurable option to send the links in each replication chunk.

The benefits of this approach are:
- it can reduce memory overhead, as we don't have to keep all the links
in memory over the entire replication cycle.
- the client should never end up knowing about objects but not their
links. (Although we're not sure that this has actually resulted in
replication problems, i.e. missing links).

Note that until we support GET_TGT, this approach can mean we now send
a link where the client doesn't know about the target object, causing
the client to siliently drop that linked attribute. Hence, this option
is switched off by default.

Implementation-wise, this code works fairly the same as before. Instead
of sorting the entire getnc_state->la_sorted array at the end and then
splitting it up over chunks, we now split the links up over chunks and
then sort them when we copy them into the message. This should be OK, as
I believe the MS-DRSR Doc says the links in the message should be sorted
(rather than sorting *all* the links overall). Windows behaviour seems
to chunk the links based on USN and then sort them.

getnc_state->la_idx now tracks which links in getnc_state->la_list[]
have already been sent (instead of tracking getnc_state->la_sorted).
This means the la_sorted array no longer needs to be stored in
getnc_state and we can free the array's memory once we've copied the
links into the message. Unfortunately, the link_given/link_total debug
no longer reports the correct information, so I've moved these into
getncchanges_state struct (and now free the struct a bit later so it's
safe to reference in the debug).

The vampire_dc testenv has been updated to use this new behaviour.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 selftest/target/Samba4.pm                 |  6 +-
 source4/rpc_server/drsuapi/getncchanges.c | 94 ++++++++++++++++++++++---------
 2 files changed, 72 insertions(+), 28 deletions(-)

diff --git a/selftest/target/Samba4.pm b/selftest/target/Samba4.pm
index 205e281..778b6fa 100755
--- a/selftest/target/Samba4.pm
+++ b/selftest/target/Samba4.pm
@@ -1284,9 +1284,12 @@ sub provision_vampire_dc($$$)
 	my ($self, $prefix, $dcvars, $fl) = @_;
 	print "PROVISIONING VAMPIRE DC @ FL $fl...\n";
 	my $name = "localvampiredc";
+	my $extra_conf = "";
 
 	if ($fl == "2000") {
-	    $name = "vampire2000dc";
+		$name = "vampire2000dc";
+	} else {
+		$extra_conf = "drs: immediate link sync = yes";
 	}
 
 	# We do this so that we don't run the provision.  That's the job of 'net vampire'.
@@ -1306,6 +1309,7 @@ sub provision_vampire_dc($$$)
 	server max protocol = SMB2
 
         ntlm auth = mschapv2-and-ntlmv2-only
+	$extra_conf
 
 [sysvol]
 	path = $ctx->{statedir}/sysvol
diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c
index 1aa4a51..fdea514 100644
--- a/source4/rpc_server/drsuapi/getncchanges.c
+++ b/source4/rpc_server/drsuapi/getncchanges.c
@@ -65,8 +65,11 @@ struct drsuapi_getncchanges_state {
 	struct drsuapi_DsReplicaCursor2CtrEx *final_udv;
 	struct drsuapi_DsReplicaLinkedAttribute *la_list;
 	uint32_t la_count;
-	struct la_for_sorting *la_sorted;
 	uint32_t la_idx;
+
+	/* these are just used for debugging the replication's progress */
+	uint32_t links_given;
+	uint32_t total_links;
 };
 
 /* We must keep the GUIDs in NDR form for sorting */
@@ -2363,8 +2366,6 @@ WERROR dcesrv_drsuapi_DsGetNCChanges(struct dcesrv_call_state *dce_call, TALLOC_
 	uint32_t max_objects;
 	uint32_t max_links;
 	uint32_t link_count = 0;
-	uint32_t link_total = 0;
-	uint32_t link_given = 0;
 	struct ldb_dn *search_dn = NULL;
 	bool am_rodc;
 	enum security_user_level security_level;
@@ -2383,6 +2384,7 @@ WERROR dcesrv_drsuapi_DsGetNCChanges(struct dcesrv_call_state *dce_call, TALLOC_
 	bool full = true;
 	uint32_t *local_pas = NULL;
 	struct ldb_dn *machine_dn = NULL; /* Only used for REPL SECRET EXOP */
+	bool immediate_link_sync;
 
 	DCESRV_PULL_HANDLE_WERR(h, r->in.bind_handle, DRSUAPI_BIND_HANDLE);
 	b_state = h->data;
@@ -2867,6 +2869,9 @@ allowed:
 	 */
 	max_links = lpcfg_parm_int(dce_call->conn->dce_ctx->lp_ctx, NULL, "drs", "max link sync", 1500);
 
+	immediate_link_sync = lpcfg_parm_bool(dce_call->conn->dce_ctx->lp_ctx, NULL,
+					      "drs", "immediate link sync", false);
+
 	/*
 	 * Maximum time that we can spend in a getncchanges
 	 * in order to avoid timeout of the other part.
@@ -2915,6 +2920,7 @@ allowed:
 		struct ldb_dn *msg_dn;
 		bool obj_already_sent = false;
 		TALLOC_CTX *tmp_ctx;
+		uint32_t old_la_index;
 
 		tmp_ctx = talloc_new(mem_ctx);
 
@@ -2986,6 +2992,8 @@ allowed:
 			}
 		}
 
+		old_la_index = getnc_state->la_count;
+
 		/*
 		 * We've reached the USN where this object naturally occurs.
 		 * Regardless of whether we've already sent the object (as an
@@ -3023,11 +3031,17 @@ allowed:
 			 new_objs ? "replicating" : "skipping send of",
 			 ldb_dn_get_linearized(msg->dn)));
 
+		getnc_state->total_links += (getnc_state->la_count - old_la_index);
+
 		TALLOC_FREE(tmp_ctx);
 	}
 
 	getnc_state->num_processed = i;
 
+	if (i < getnc_state->num_records) {
+		r->out.ctr->ctr6.more_data = true;
+	}
+
 	/* the client can us to call UpdateRefs on its behalf to
 	   re-establish monitoring of the NC */
 	if ((req10->replica_flags & (DRSUAPI_DRS_ADD_REF | DRSUAPI_DRS_REF_GCSPN)) &&
@@ -3073,25 +3087,32 @@ allowed:
 		max_links -= r->out.ctr->ctr6.object_count;
 	}
 
-	link_total = getnc_state->la_count;
-
-	if (i < getnc_state->num_records) {
-		r->out.ctr->ctr6.more_data = true;
-	} else {
-		/* sort the whole array the first time */
-		if (getnc_state->la_sorted == NULL) {
-			werr = getncchanges_get_sorted_array(getnc_state->la_list,
-							     getnc_state->la_count,
-							     sam_ctx, getnc_state,
-							     schema,
-							     &getnc_state->la_sorted);
-			if (!W_ERROR_IS_OK(werr)) {
-				return werr;
-			}
-		}
-
+	/*
+	 * Work out how many links we can send in this chunk. The default is to
+	 * send all the links last, but there is a config option to send them
+	 * immediately, in the same chunk as their source object
+	 */
+	if (!r->out.ctr->ctr6.more_data || immediate_link_sync) {
 		link_count = getnc_state->la_count - getnc_state->la_idx;
 		link_count = MIN(max_links, link_count);
+	}
+
+	/* If we've got linked attributes to send, add them now */
+	if (link_count > 0) {
+		struct la_for_sorting *la_sorted;
+
+		/*
+		 * Grab a chunk of linked attributes off the list and put them
+		 * in sorted array, ready to send
+		 */
+		werr = getncchanges_get_sorted_array(&getnc_state->la_list[getnc_state->la_idx],
+						     link_count,
+						     sam_ctx, getnc_state,
+						     schema,
+						     &la_sorted);
+		if (!W_ERROR_IS_OK(werr)) {
+			return werr;
+		}
 
 		r->out.ctr->ctr6.linked_attributes_count = link_count;
 		r->out.ctr->ctr6.linked_attributes = talloc_array(r->out.ctr, struct drsuapi_DsReplicaLinkedAttribute, link_count);
@@ -3101,16 +3122,29 @@ allowed:
 		}
 
 		for (k = 0; k < link_count; k++) {
-			r->out.ctr->ctr6.linked_attributes[k]
-				= *getnc_state->la_sorted[getnc_state->la_idx + k].link;
+			r->out.ctr->ctr6.linked_attributes[k] = *la_sorted[k].link;
 		}
 
 		getnc_state->la_idx += link_count;
-		link_given = getnc_state->la_idx;
+		getnc_state->links_given += link_count;
 
 		if (getnc_state->la_idx < getnc_state->la_count) {
 			r->out.ctr->ctr6.more_data = true;
+		} else {
+
+			/*
+			 * We've now sent all the links seen so far, so we can
+			 * reset la_list back to an empty list again. Note that
+			 * the steal means the linked attribute memory gets
+			 * freed after this RPC message is sent on the wire.
+			 */
+			talloc_steal(mem_ctx, getnc_state->la_list);
+			getnc_state->la_list = NULL;
+			getnc_state->la_idx = 0;
+			getnc_state->la_count = 0;
 		}
+
+		TALLOC_FREE(la_sorted);
 	}
 
 	if (req10->replica_flags & DRSUAPI_DRS_GET_NC_SIZE) {
@@ -3126,17 +3160,23 @@ allowed:
 		 * of links we found so far during the cycle.
 		 */
 		r->out.ctr->ctr6.nc_object_count = getnc_state->num_records;
-		r->out.ctr->ctr6.nc_linked_attributes_count = getnc_state->la_count;
+		r->out.ctr->ctr6.nc_linked_attributes_count = getnc_state->total_links;
 	}
 
 	if (!r->out.ctr->ctr6.more_data) {
-		talloc_steal(mem_ctx, getnc_state->la_list);
 
+		/* this is the last response in the replication cycle */
 		r->out.ctr->ctr6.new_highwatermark = getnc_state->final_hwm;
 		r->out.ctr->ctr6.uptodateness_vector = talloc_move(mem_ctx,
 							&getnc_state->final_udv);
 
-		talloc_free(getnc_state);
+		/*
+		 * Free the state info stored for the replication cycle. Note
+		 * that the RPC message we're sending contains links stored in
+		 * getnc_state. mem_ctx is local to this RPC call, so the memory
+		 * will get freed after the RPC message is sent on the wire.
+		 */
+		talloc_steal(mem_ctx, getnc_state);
 		b_state->getncchanges_state = NULL;
 	} else {
 		ret = drsuapi_DsReplicaHighWaterMark_cmp(&r->out.ctr->ctr6.old_highwatermark,
@@ -3172,7 +3212,7 @@ allowed:
 	       r->out.ctr->ctr6.object_count,
 	       i, r->out.ctr->ctr6.more_data?getnc_state->num_records:i,
 	       r->out.ctr->ctr6.linked_attributes_count,
-	       link_given, link_total,
+	       getnc_state->links_given, getnc_state->total_links,
 	       dom_sid_string(mem_ctx, user_sid)));
 
 #if 0
-- 
2.7.4


From f08cdbb3995ed2a2dd6ef1c1786fcace110511ee Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Wed, 12 Jul 2017 13:58:57 +1200
Subject: [PATCH 22/36] getncchanges.c: Add basic GET_TGT support

This adds basic DRS_GET_TGT support. If the GET_TGT flag is specified
then the server will use the object cache to store the objects it sends
back. If the target object for a linked attribute is not in the cache
(i.e. it has not been sent already), then it is added to the response
message.

Note that large numbers of linked attributes will not be handled well
yet - the server could potentially try to send more than will fit in a
single repsonse message.

Also note that the client can sometimes set the GET_TGT flag even if the
server is still sending the links last. In this case, we know the client
supports GET_TGT so it's safe to send the links interleaved with the
source objects (the alternative of fetching the target objects but not
sending the links until last doesn't really make any sense).

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/common/util.c                      |  40 ++++++
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c |  44 +-----
 source4/rpc_server/drsuapi/getncchanges.c       | 178 +++++++++++++++++++++++-
 3 files changed, 216 insertions(+), 46 deletions(-)

diff --git a/source4/dsdb/common/util.c b/source4/dsdb/common/util.c
index 7a12c71..0d90729 100644
--- a/source4/dsdb/common/util.c
+++ b/source4/dsdb/common/util.c
@@ -5550,3 +5550,43 @@ int dsdb_user_obj_set_primary_group_id(struct ldb_context *ldb, struct ldb_messa
 
 	return LDB_SUCCESS;
 }
+
+/**
+ * Returns True if the source and target DNs both have the same naming context,
+ * i.e. they're both in the same partition.
+ */
+bool dsdb_objects_have_same_nc(struct ldb_context *ldb,
+			       TALLOC_CTX *mem_ctx,
+			       struct ldb_dn *source_dn,
+			       struct ldb_dn *target_dn)
+{
+	TALLOC_CTX *tmp_ctx;
+	struct ldb_dn *source_nc;
+	struct ldb_dn *target_nc;
+	int ret;
+	bool same_nc = true;
+
+	tmp_ctx = talloc_new(mem_ctx);
+
+	ret = dsdb_find_nc_root(ldb, tmp_ctx, source_dn, &source_nc);
+	if (ret != LDB_SUCCESS) {
+		DBG_ERR("Failed to find base DN for source %s\n",
+			ldb_dn_get_linearized(source_dn));
+		talloc_free(tmp_ctx);
+		return true;
+	}
+
+	ret = dsdb_find_nc_root(ldb, tmp_ctx, target_dn, &target_nc);
+	if (ret != LDB_SUCCESS) {
+		DBG_ERR("Failed to find base DN for target %s\n",
+			ldb_dn_get_linearized(target_dn));
+		talloc_free(tmp_ctx);
+		return true;
+	}
+
+	same_nc = (ldb_dn_compare(source_nc, target_nc) == 0);
+
+	talloc_free(tmp_ctx);
+
+	return same_nc;
+}
diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index d5234d5..1396f84 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -6665,46 +6665,6 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 }
 
 /**
- * Returns True if the source and target DNs both have the same naming context,
- * i.e. they're both in the same partition.
- */
-static bool replmd_objects_have_same_nc(struct ldb_context *ldb,
-					TALLOC_CTX *mem_ctx,
-					struct ldb_dn *source_dn,
-					struct ldb_dn *target_dn)
-{
-	TALLOC_CTX *tmp_ctx;
-	struct ldb_dn *source_nc;
-	struct ldb_dn *target_nc;
-	int ret;
-	bool same_nc = true;
-
-	tmp_ctx = talloc_new(mem_ctx);
-
-	ret = dsdb_find_nc_root(ldb, tmp_ctx, source_dn, &source_nc);
-	if (ret != LDB_SUCCESS) {
-		DBG_ERR("Failed to find base DN for source %s\n",
-			ldb_dn_get_linearized(source_dn));
-		talloc_free(tmp_ctx);
-		return true;
-	}
-
-	ret = dsdb_find_nc_root(ldb, tmp_ctx, target_dn, &target_nc);
-	if (ret != LDB_SUCCESS) {
-		DBG_ERR("Failed to find base DN for target %s\n",
-			ldb_dn_get_linearized(target_dn));
-		talloc_free(tmp_ctx);
-		return true;
-	}
-
-	same_nc = (ldb_dn_compare(source_nc, target_nc) == 0);
-
-	talloc_free(tmp_ctx);
-
-	return same_nc;
-}
-
-/**
  * Checks that the target object for a linked attribute exists.
  * @param guid returns the target object's GUID (is returned)if it exists)
  * @param ignore_link set to true if the linked attribute should be ignored
@@ -6807,8 +6767,8 @@ static int replmd_check_target_exists(struct ldb_module *module,
 				 ldb_dn_get_linearized(source_dn)));
 			*ignore_link = true;
 
-		} else if (replmd_objects_have_same_nc(ldb, tmp_ctx, source_dn,
-						       dsdb_dn->dn)) {
+		} else if (dsdb_objects_have_same_nc(ldb, tmp_ctx, source_dn,
+						     dsdb_dn->dn)) {
 			ldb_asprintf_errstring(ldb, "Unknown target %s GUID %s linked from %s\n",
 					       ldb_dn_get_linearized(dsdb_dn->dn),
 					       GUID_string(tmp_ctx, guid),
diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c
index fdea514..2dadb98 100644
--- a/source4/rpc_server/drsuapi/getncchanges.c
+++ b/source4/rpc_server/drsuapi/getncchanges.c
@@ -57,6 +57,7 @@ struct drsuapi_getncchanges_state {
 	struct GUID ncRoot_guid;
 	bool is_schema_nc;
 	bool is_get_anc;
+	bool is_get_tgt;
 	uint64_t min_usn;
 	uint64_t max_usn;
 	struct drsuapi_DsReplicaHighWaterMark last_hwm;
@@ -2340,6 +2341,137 @@ static WERROR getncchanges_get_obj_to_send(const struct ldb_message *msg,
 	return werr;
 }
 
+/**
+ * Goes through any new linked attributes and checks that the target object
+ * will be known to the client, i.e. we've already sent it in an replication
+ * chunk. If not, then it adds the target object to the current replication
+ * chunk. This is only done when the client specifies DRS_GET_TGT.
+ */
+static WERROR getncchanges_chunk_add_la_targets(struct getncchanges_repl_chunk *repl_chunk,
+						struct drsuapi_getncchanges_state *getnc_state,
+						uint32_t start_la_index,
+						TALLOC_CTX *mem_ctx,
+						struct ldb_context *sam_ctx,
+						struct dsdb_schema *schema,
+						DATA_BLOB *session_key,
+						struct drsuapi_DsGetNCChangesRequest10 *req10,
+						uint32_t *local_pas,
+						struct ldb_dn *machine_dn)
+{
+	int ret;
+	uint32_t i;
+	WERROR werr = WERR_OK;
+	static const char * const msg_attrs[] = {
+					    "*",
+					    "nTSecurityDescriptor",
+					    "parentGUID",
+					    "replPropertyMetaData",
+					    DSDB_SECRET_ATTRIBUTES,
+					    NULL };
+
+	/* loop through any linked attributes to check */
+	for (i = start_la_index; i < getnc_state->la_count; i++) {
+
+		struct GUID target_guid;
+		struct drsuapi_DsReplicaObjectListItemEx *new_objs = NULL;
+		const struct drsuapi_DsReplicaLinkedAttribute *la;
+		struct ldb_message *msg;
+		struct ldb_result *msg_res;
+		struct ldb_dn *msg_dn;
+		TALLOC_CTX *tmp_ctx;
+		struct dsdb_dn *dn;
+		const struct dsdb_attribute *schema_attrib;
+		NTSTATUS status;
+
+		la = &getnc_state->la_list[i];
+		tmp_ctx = talloc_new(mem_ctx);
+
+		/* get the GUID of the linked attribute's target object */
+		schema_attrib = dsdb_attribute_by_attributeID_id(schema,
+								 la->attid);
+
+		werr = dsdb_dn_la_from_blob(sam_ctx, schema_attrib, schema,
+					    tmp_ctx, la->value.blob, &dn);
+
+		if (!W_ERROR_IS_OK(werr)) {
+			DEBUG(0,(__location__ ": Bad la blob\n"));
+			return werr;
+		}
+
+		status = dsdb_get_extended_dn_guid(dn->dn, &target_guid, "GUID");
+
+		if (!NT_STATUS_IS_OK(status)) {
+			return ntstatus_to_werror(status);
+		}
+
+		/*
+		 * if the target isn't in the cache, then the client
+		 * might not know about it, so send the target now
+		 */
+		werr = dcesrv_drsuapi_obj_cache_exists(getnc_state->obj_cache,
+						       &target_guid);
+
+		if (W_ERROR_EQUAL(werr, WERR_OBJECT_NAME_EXISTS)) {
+
+			/* target already sent, nothing to do */
+			TALLOC_FREE(tmp_ctx);
+			continue;
+		}
+
+		msg_dn = ldb_dn_new_fmt(tmp_ctx, sam_ctx, "<GUID=%s>",
+					GUID_string(tmp_ctx, &target_guid));
+		W_ERROR_HAVE_NO_MEMORY(msg_dn);
+
+		ret = drsuapi_search_with_extended_dn(sam_ctx, tmp_ctx,
+						      &msg_res, msg_dn,
+						      LDB_SCOPE_BASE,
+						      msg_attrs, NULL);
+
+		/*
+		 * If we can't resolve the link target, then the client
+		 * definitely won't be able to. Abort the replication
+		 */
+		if (ret != LDB_SUCCESS) {
+			DBG_ERR("getncchanges: failed to fetch link target DN %s - %s\n",
+				ldb_dn_get_extended_linearized(tmp_ctx, msg_dn, 1),
+				ldb_errstring(sam_ctx));
+			return WERR_DS_DRA_INCONSISTENT_DIT;
+		}
+
+		msg = msg_res->msgs[0];
+
+		/* don't fetch objects from another partition */
+		if (!dsdb_objects_have_same_nc(sam_ctx, tmp_ctx, msg->dn,
+					       getnc_state->ncRoot_dn)) {
+			TALLOC_FREE(tmp_ctx);
+			continue;
+		}
+
+		/*
+		 * Construct an object, ready to send (this will include
+		 * the object's ancestors as well, if GET_ANC is set)
+		 */
+		werr = getncchanges_get_obj_to_send(msg, mem_ctx, sam_ctx,
+						    getnc_state, schema,
+						    session_key, req10,
+						    false,
+						    local_pas, machine_dn,
+						    &target_guid,
+						    &new_objs);
+		if (!W_ERROR_IS_OK(werr)) {
+			return werr;
+		}
+
+		if (new_objs != NULL) {
+			getncchanges_add_objs_to_resp(repl_chunk, new_objs);
+		}
+		TALLOC_FREE(tmp_ctx);
+
+		/* TODO could have 1000s of links. Stop if we fill up the message */
+	}
+
+	return WERR_OK;
+}
 
 /* 
   drsuapi_DsGetNCChanges
@@ -2804,17 +2936,37 @@ allowed:
 		talloc_free(search_res);
 		talloc_free(changes);
 
-		if (req10->extended_op != DRSUAPI_EXOP_NONE) {
-			/* Do nothing */
-		} else if (req10->replica_flags & DRSUAPI_DRS_GET_ANC) {
+		if (req10->extended_op == DRSUAPI_EXOP_NONE) {
+			getnc_state->is_get_anc =
+				((req10->replica_flags & DRSUAPI_DRS_GET_ANC) != 0);
+			getnc_state->is_get_tgt =
+				((req10->more_flags & DRSUAPI_DRS_GET_TGT) != 0);
+		}
+
+		/*
+		 * when using GET_ANC or GET_TGT, cache the objects that have
+		 * been already sent, to avoid sending them multiple times
+		 */
+		if (getnc_state->is_get_anc || getnc_state->is_get_tgt) {
+			DEBUG(3,("Using object cache, GET_ANC %u, GET_TGT %u\n",
+				 getnc_state->is_get_anc,
+				 getnc_state->is_get_tgt));
+
 			getnc_state->obj_cache = db_open_rbt(getnc_state);
 			if (getnc_state->obj_cache == NULL) {
 				return WERR_NOT_ENOUGH_MEMORY;
 			}
-			getnc_state->is_get_anc = true;
 		}
 	}
 
+	/*
+	 * If the client has already set GET_TGT then we know they can handle
+	 * receiving the linked attributes interleaved with the source objects
+	 */
+	if (getnc_state->is_get_tgt) {
+		repl_chunk->immediate_link_sync = true;
+	}
+
 	if (req10->uptodateness_vector) {
 		/* make sure its sorted */
 		TYPESAFE_QSORT(req10->uptodateness_vector->cursors,
@@ -3033,6 +3185,24 @@ allowed:
 
 		getnc_state->total_links += (getnc_state->la_count - old_la_index);
 
+		/*
+		 * If the GET_TGT flag was set, check any new links added to
+		 * make sure the client knows about the link target object
+		 */
+		if (getnc_state->is_get_tgt) {
+			werr = getncchanges_chunk_add_la_targets(&repl_chunk,
+								 getnc_state,
+								 old_la_index,
+								 mem_ctx, sam_ctx,
+								 schema, &session_key,
+								 req10, local_pas,
+								 machine_dn);
+
+			if (!W_ERROR_IS_OK(werr)) {
+				return werr;
+			}
+		}
+
 		TALLOC_FREE(tmp_ctx);
 	}
 
-- 
2.7.4


From 9df1bc19d44b6f77399aa9e800b73dd8b1c0a6f6 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Tue, 23 May 2017 14:37:56 +1200
Subject: [PATCH 23/36] getnc_exop.py: Fix GET_TGT behaviour in DRS tests

The existing code never passed the more_flags parameter into the
actual getNCChanges request, i.e. _getnc_req10(). This meant the
existing GET_TGT tests effectively did nothing.

Passing the flag through properly means we have to now change the tests
as the DNs returned by Windows now include any target objects in the
linked attributes.

Also added comments to the tests to help explain what they are actually
doing.

With the patches for basic GET_TGT server support, these tests now also
pass on Samba. To do so, I've had to add dn_ordered=False. I don't think
we really care about this discrepancy - Samba applies all the objects
before it applies the links, so hopefully it shouldn't matter what order
the objects are returned in.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/torture/drs/python/drs_base.py   |  3 ++-
 source4/torture/drs/python/getnc_exop.py | 20 ++++++++++++++++----
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/source4/torture/drs/python/drs_base.py b/source4/torture/drs/python/drs_base.py
index b19c51a..bae0b20 100644
--- a/source4/torture/drs/python/drs_base.py
+++ b/source4/torture/drs/python/drs_base.py
@@ -277,7 +277,8 @@ class DrsBaseTestCase(SambaToolCmdTest):
                                   nc_dn_str=nc_dn_str,
                                   exop=exop,
                                   max_objects=max_objects,
-                                  replica_flags=replica_flags)
+                                  replica_flags=replica_flags,
+                                  more_flags=more_flags)
         req10.highwatermark = highwatermark
         if uptodateness_vector is not None:
             uptodateness_vector_v1 = drsuapi.DsReplicaCursorCtrEx()
diff --git a/source4/torture/drs/python/getnc_exop.py b/source4/torture/drs/python/getnc_exop.py
index caa7826..2caa50c 100644
--- a/source4/torture/drs/python/getnc_exop.py
+++ b/source4/torture/drs/python/getnc_exop.py
@@ -410,11 +410,16 @@ class DrsReplicaSyncTestCase(drs_base.DrsBaseTestCase):
             drsuapi.DRSUAPI_DRS_GET_ANC,
             expected_links=[dc3_managedBy_ou1])
 
-        self._check_replication([dc3],
+        # GET_TGT seems to override DRS_CRITICAL_ONLY and also returns any
+        # object(s) that relate to the linked attributes (similar to GET_ANC)
+        self._check_replication([ou1, dc3],
             drsuapi.DRSUAPI_DRS_CRITICAL_ONLY,
             more_flags=drsuapi.DRSUAPI_DRS_GET_TGT,
-            expected_links=[dc3_managedBy_ou1])
+            expected_links=[dc3_managedBy_ou1], dn_ordered=False)
 
+        # Change DC3's managedBy to OU2 instead of OU1
+        # Note that the OU1 managedBy linked attribute will still exist as
+        # a tombstone object (and so will be returned in the replication still)
         m = ldb.Message()
         m.dn = ldb.Dn(self.ldb_dc1, dc3)
         m["managedBy"] = ldb.MessageElement(ou2, ldb.FLAG_MOD_REPLACE, "managedBy")
@@ -443,11 +448,16 @@ class DrsReplicaSyncTestCase(drs_base.DrsBaseTestCase):
             drsuapi.DRSUAPI_DRS_GET_ANC,
             expected_links=[dc3_managedBy_ou1,dc3_managedBy_ou2])
 
-        self._check_replication([dc3],
+        # GET_TGT will also return any DNs referenced by the linked attributes
+        # (including the Tombstone attribute)
+        self._check_replication([ou1, ou2, dc3],
             drsuapi.DRSUAPI_DRS_CRITICAL_ONLY,
             more_flags=drsuapi.DRSUAPI_DRS_GET_TGT,
-            expected_links=[dc3_managedBy_ou1,dc3_managedBy_ou2])
+            expected_links=[dc3_managedBy_ou1,dc3_managedBy_ou2], dn_ordered=False)
 
+        # Use the highwater-mark prior to changing ManagedBy - this should
+        # only return the old/Tombstone and new linked attributes (we already
+        # know all the DNs)
         self._check_replication([],
             drsuapi.DRSUAPI_DRS_WRIT_REP,
             expected_links=[dc3_managedBy_ou1,dc3_managedBy_ou2],
@@ -476,6 +486,8 @@ class DrsReplicaSyncTestCase(drs_base.DrsBaseTestCase):
             expected_links=[dc3_managedBy_ou1,dc3_managedBy_ou2],
             highwatermark=hwm7)
 
+        # Repeat the above set of tests using the uptodateness_vector
+        # instead of the highwater-mark
         self._check_replication([],
             drsuapi.DRSUAPI_DRS_WRIT_REP,
             expected_links=[dc3_managedBy_ou1,dc3_managedBy_ou2],
-- 
2.7.4


From d6cb6a7986420000b32ebcb34227c8ac6835d719 Mon Sep 17 00:00:00 2001
From: Garming Sam <garming at catalyst.net.nz>
Date: Tue, 8 Aug 2017 16:27:18 +1200
Subject: [PATCH 24/36] getncchanges.c: Refactor to track more state using
 repl_chunk

To prepare GET_TGT to deal with a large number of links better, there
is now a 'repl_chunk' struct to help keep track of all the factors
relating to the current chunk of replication data (i.e. how many
objects/links we can send and how many we've already processed). This
means we can have a consistent way of working out whether the current
chunk is full (whether that be due to objects, links, or just too much
time taken).

This patch should not alter functionality. This is just a refactor to
add the basic framework, which will be used in the next patch.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
Signed-off-by: Garming Sam <garming at catalyst.net.nz>
---
 source4/rpc_server/drsuapi/getncchanges.c | 231 ++++++++++++++++++++----------
 1 file changed, 159 insertions(+), 72 deletions(-)

diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c
index 2dadb98..3abba89 100644
--- a/source4/rpc_server/drsuapi/getncchanges.c
+++ b/source4/rpc_server/drsuapi/getncchanges.c
@@ -85,9 +85,18 @@ struct la_for_sorting {
  * only exists for a single call to dcesrv_drsuapi_DsGetNCChanges()
  */
 struct getncchanges_repl_chunk {
-	struct drsuapi_DsGetNCChangesCtr6 *ctr6;
+	uint32_t max_objects;
+	uint32_t max_links;
+	uint32_t tgt_la_count;
+	bool immediate_link_sync;
+	time_t max_wait;
+	time_t start;
 
-	/* the last object written to the response */
+	/* stores the objects to be sent in this chunk */
+	uint32_t object_count;
+	struct drsuapi_DsReplicaObjectListItemEx *object_list;
+
+	/* the last object added to this replication chunk */
 	struct drsuapi_DsReplicaObjectListItemEx *last_object;
 };
 
@@ -2240,25 +2249,25 @@ static WERROR getncchanges_add_ancestors(struct drsuapi_DsReplicaObjectListItemE
 }
 
 /**
- * Adds a list of new objects into the getNCChanges response message
+ * Adds a list of new objects into the current chunk of replication data to send
  */
-static void getncchanges_add_objs_to_resp(struct getncchanges_repl_chunk *repl_chunk,
-					  struct drsuapi_DsReplicaObjectListItemEx *obj_list)
+static void getncchanges_chunk_add_objects(struct getncchanges_repl_chunk *repl_chunk,
+					   struct drsuapi_DsReplicaObjectListItemEx *obj_list)
 {
 	struct drsuapi_DsReplicaObjectListItemEx *obj;
 
 	/*
-	 * We track the last object added to the response message, so just add
+	 * We track the last object added to the replication chunk, so just add
 	 * the new object-list onto the end
 	 */
-	if (repl_chunk->last_object == NULL) {
-		repl_chunk->ctr6->first_object = obj_list;
+	if (repl_chunk->object_list == NULL) {
+		repl_chunk->object_list = obj_list;
 	} else {
 		repl_chunk->last_object->next_object = obj_list;
 	}
 
 	for (obj = obj_list; obj != NULL; obj = obj->next_object) {
-		repl_chunk->ctr6->object_count += 1;
+		repl_chunk->object_count += 1;
 
 		/*
 		 * Remember the last object in the response - we'll use this to
@@ -2342,6 +2351,67 @@ static WERROR getncchanges_get_obj_to_send(const struct ldb_message *msg,
 }
 
 /**
+ * Returns the max number of links that will fit in the current replication chunk
+ */
+static uint32_t getncchanges_chunk_max_links(struct getncchanges_repl_chunk *repl_chunk)
+{
+	uint32_t max_links;
+
+	/*
+	 * This is just an approximate guess to avoid overfilling the replication
+	 * chunk. E.g. if we've already sent 1000 objects, then send 1000 fewer
+	 * links. For comparison, the max that Windows seems to send is ~2700
+	 * links and ~250 objects (although this may vary based on timeouts)
+	 */
+	if (repl_chunk->object_count >= repl_chunk->max_links) {
+
+		/* request is already full of objects - don't send any links */
+		max_links = 0;
+	} else {
+
+		/* send fewer links if we're already sending a lot of objects */
+		max_links = repl_chunk->max_links - repl_chunk->object_count;
+	}
+
+	return max_links;
+}
+
+/**
+ * Returns true if the current GetNCChanges() call has taken longer than its
+ * allotted time. This prevents the client from timing out.
+ */
+static bool getncchanges_chunk_timed_out(struct getncchanges_repl_chunk *repl_chunk)
+{
+	return (time(NULL) - repl_chunk->start > repl_chunk->max_wait);
+}
+
+/**
+ * Returns true if the current chunk of replication data has reached the
+ * max_objects
+ */
+static bool getncchanges_chunk_is_full(struct getncchanges_repl_chunk *repl_chunk,
+				       struct drsuapi_getncchanges_state *getnc_state)
+{
+	bool chunk_full = false;
+
+	/* check if the current chunk is already full with objects */
+	if (repl_chunk->object_count >= repl_chunk->max_objects) {
+		chunk_full = true;
+
+	} else if (repl_chunk->object_count > 0 &&
+		   getncchanges_chunk_timed_out(repl_chunk)) {
+
+		/*
+		 * We've exceeded our allotted time building this chunk,
+		 * and we have at least one object to send back to the client
+		 */
+		chunk_full = true;
+	}
+
+	return chunk_full;
+}
+
+/**
  * Goes through any new linked attributes and checks that the target object
  * will be known to the client, i.e. we've already sent it in an replication
  * chunk. If not, then it adds the target object to the current replication
@@ -2463,7 +2533,7 @@ static WERROR getncchanges_chunk_add_la_targets(struct getncchanges_repl_chunk *
 		}
 
 		if (new_objs != NULL) {
-			getncchanges_add_objs_to_resp(repl_chunk, new_objs);
+			getncchanges_chunk_add_objects(repl_chunk, new_objs);
 		}
 		TALLOC_FREE(tmp_ctx);
 
@@ -2473,6 +2543,58 @@ static WERROR getncchanges_chunk_add_la_targets(struct getncchanges_repl_chunk *
 	return WERR_OK;
 }
 
+/**
+ * Creates a helper struct used for building a chunk of replication data,
+ * i.e. used over a single call to dcesrv_drsuapi_DsGetNCChanges().
+ */
+static struct getncchanges_repl_chunk * getncchanges_chunk_new(TALLOC_CTX *mem_ctx,
+							       struct dcesrv_call_state *dce_call,
+							       struct drsuapi_DsGetNCChangesRequest10 *req10)
+{
+	struct getncchanges_repl_chunk *repl_chunk;
+
+	repl_chunk = talloc_zero(mem_ctx, struct getncchanges_repl_chunk);
+
+	repl_chunk->start = time(NULL);
+
+	repl_chunk->max_objects = lpcfg_parm_int(dce_call->conn->dce_ctx->lp_ctx, NULL,
+						 "drs", "max object sync", 1000);
+
+	/*
+	 * The client control here only applies in normal replication, not extended
+	 * operations, which return a fixed set, even if the caller
+	 * sets max_object_count == 0
+	 */
+	if (req10->extended_op == DRSUAPI_EXOP_NONE) {
+
+		/*
+		 * use this to force single objects at a time, which is useful
+		 * for working out what object is giving problems
+		 */
+		if (req10->max_object_count < repl_chunk->max_objects) {
+			repl_chunk->max_objects = req10->max_object_count;
+		}
+	}
+
+	repl_chunk->max_links =
+			lpcfg_parm_int(dce_call->conn->dce_ctx->lp_ctx, NULL,
+				       "drs", "max link sync", 1500);
+
+	repl_chunk->immediate_link_sync =
+			lpcfg_parm_bool(dce_call->conn->dce_ctx->lp_ctx, NULL,
+					"drs", "immediate link sync", false);
+
+	/*
+	 * Maximum time that we can spend in a getncchanges
+	 * in order to avoid timeout of the other part.
+	 * 10 seconds by default.
+	 */
+	repl_chunk->max_wait = lpcfg_parm_int(dce_call->conn->dce_ctx->lp_ctx,
+					      NULL, "drs", "max work time", 10);
+
+	return repl_chunk;
+}
+
 /* 
   drsuapi_DsGetNCChanges
 
@@ -2486,7 +2608,7 @@ WERROR dcesrv_drsuapi_DsGetNCChanges(struct dcesrv_call_state *dce_call, TALLOC_
 	uint32_t i, k;
 	struct dsdb_schema *schema;
 	struct drsuapi_DsReplicaOIDMapping_Ctr *ctr;
-	struct getncchanges_repl_chunk repl_chunk = { 0 };
+	struct getncchanges_repl_chunk *repl_chunk;
 	NTSTATUS status;
 	DATA_BLOB session_key;
 	WERROR werr;
@@ -2495,7 +2617,6 @@ WERROR dcesrv_drsuapi_DsGetNCChanges(struct dcesrv_call_state *dce_call, TALLOC_
 	struct drsuapi_getncchanges_state *getnc_state;
 	struct drsuapi_DsGetNCChangesRequest10 *req10;
 	uint32_t options;
-	uint32_t max_objects;
 	uint32_t max_links;
 	uint32_t link_count = 0;
 	struct ldb_dn *search_dn = NULL;
@@ -2506,9 +2627,6 @@ WERROR dcesrv_drsuapi_DsGetNCChanges(struct dcesrv_call_state *dce_call, TALLOC_
 	bool is_secret_request;
 	bool is_gc_pas_request;
 	struct drsuapi_changed_objects *changes;
-	time_t max_wait;
-	time_t start = time(NULL);
-	bool max_wait_reached = false;
 	bool has_get_all_changes = false;
 	struct GUID invocation_id;
 	static const struct drsuapi_DsReplicaLinkedAttribute no_linked_attr;
@@ -2516,7 +2634,6 @@ WERROR dcesrv_drsuapi_DsGetNCChanges(struct dcesrv_call_state *dce_call, TALLOC_
 	bool full = true;
 	uint32_t *local_pas = NULL;
 	struct ldb_dn *machine_dn = NULL; /* Only used for REPL SECRET EXOP */
-	bool immediate_link_sync;
 
 	DCESRV_PULL_HANDLE_WERR(h, r->in.bind_handle, DRSUAPI_BIND_HANDLE);
 	b_state = h->data;
@@ -2538,13 +2655,6 @@ WERROR dcesrv_drsuapi_DsGetNCChanges(struct dcesrv_call_state *dce_call, TALLOC_
 	r->out.ctr->ctr6.source_dsa_invocation_id = *(samdb_ntds_invocation_id(sam_ctx));
 	r->out.ctr->ctr6.first_object = NULL;
 
-	/* a RODC doesn't allow for any replication */
-	ret = samdb_rodc(sam_ctx, &am_rodc);
-	if (ret == LDB_SUCCESS && am_rodc) {
-		DEBUG(0,(__location__ ": DsGetNCChanges attempt on RODC\n"));
-		return WERR_DS_DRA_SOURCE_DISABLED;
-	}
-
 	/* Check request revision. 
 	 */
 	switch (r->in.level) {
@@ -2563,6 +2673,18 @@ WERROR dcesrv_drsuapi_DsGetNCChanges(struct dcesrv_call_state *dce_call, TALLOC_
 		return WERR_REVISION_MISMATCH;
 	}
 
+	repl_chunk = getncchanges_chunk_new(mem_ctx, dce_call, req10);
+
+	if (repl_chunk == NULL) {
+		return WERR_NOT_ENOUGH_MEMORY;
+	}
+
+	/* a RODC doesn't allow for any replication */
+	ret = samdb_rodc(sam_ctx, &am_rodc);
+	if (ret == LDB_SUCCESS && am_rodc) {
+		DEBUG(0,(__location__ ": DsGetNCChanges attempt on RODC\n"));
+		return WERR_DS_DRA_SOURCE_DISABLED;
+	}
 
         /* Perform access checks. */
 	/* TODO: we need to support a sync on a specific non-root
@@ -2999,38 +3121,6 @@ allowed:
 	r->out.ctr->ctr6.old_highwatermark = req10->highwatermark;
 	r->out.ctr->ctr6.new_highwatermark = req10->highwatermark;
 
-	repl_chunk.ctr6 = &r->out.ctr->ctr6;
-	repl_chunk.last_object = NULL;
-
-	max_objects = lpcfg_parm_int(dce_call->conn->dce_ctx->lp_ctx, NULL, "drs", "max object sync", 1000);
-	/*
-	 * The client control here only applies in normal replication, not extended
-	 * operations, which return a fixed set, even if the caller
-	 * sets max_object_count == 0
-	 */
-	if (req10->extended_op == DRSUAPI_EXOP_NONE) {
-		/* use this to force single objects at a time, which is useful
-		 * for working out what object is giving problems
-		 */
-		if (req10->max_object_count < max_objects) {
-			max_objects = req10->max_object_count;
-		}
-	}
-	/*
-	 * TODO: work out how the maximum should be calculated
-	 */
-	max_links = lpcfg_parm_int(dce_call->conn->dce_ctx->lp_ctx, NULL, "drs", "max link sync", 1500);
-
-	immediate_link_sync = lpcfg_parm_bool(dce_call->conn->dce_ctx->lp_ctx, NULL,
-					      "drs", "immediate link sync", false);
-
-	/*
-	 * Maximum time that we can spend in a getncchanges
-	 * in order to avoid timeout of the other part.
-	 * 10 seconds by default.
-	 */
-	max_wait = lpcfg_parm_int(dce_call->conn->dce_ctx->lp_ctx, NULL, "drs", "max work time", 10);
-
 	if (req10->partial_attribute_set != NULL) {
 		struct dsdb_syntax_ctx syntax_ctx;
 		uint32_t j = 0;
@@ -3056,8 +3146,7 @@ allowed:
 
 	for (i=getnc_state->num_processed;
 	     i<getnc_state->num_records &&
-		     (r->out.ctr->ctr6.object_count < max_objects)
-		     && !max_wait_reached;
+		     !getncchanges_chunk_is_full(repl_chunk, getnc_state);
 	    i++) {
 		struct drsuapi_DsReplicaObjectListItemEx *new_objs = NULL;
 		struct ldb_message *msg;
@@ -3126,7 +3215,9 @@ allowed:
 		}
 
 		if (!obj_already_sent) {
-			max_wait_reached = (time(NULL) - start > max_wait);
+			bool max_wait_reached;
+
+			max_wait_reached = getncchanges_chunk_timed_out(repl_chunk);
 
 			/*
 			 * Construct an object, ready to send (this will include
@@ -3173,7 +3264,7 @@ allowed:
 			 * Add the object (and, if GET_ANC, any parents it may
 			 * have) into the current chunk of replication data
 			 */
-			getncchanges_add_objs_to_resp(&repl_chunk, new_objs);
+			getncchanges_chunk_add_objects(repl_chunk, new_objs);
 
 			talloc_free(getnc_state->last_dn);
 			getnc_state->last_dn = talloc_move(getnc_state, &msg->dn);
@@ -3190,7 +3281,7 @@ allowed:
 		 * make sure the client knows about the link target object
 		 */
 		if (getnc_state->is_get_tgt) {
-			werr = getncchanges_chunk_add_la_targets(&repl_chunk,
+			werr = getncchanges_chunk_add_la_targets(repl_chunk,
 								 getnc_state,
 								 old_la_index,
 								 mem_ctx, sam_ctx,
@@ -3206,6 +3297,10 @@ allowed:
 		TALLOC_FREE(tmp_ctx);
 	}
 
+	/* copy the constructed object list into the response message */
+	r->out.ctr->ctr6.object_count = repl_chunk->object_count;
+	r->out.ctr->ctr6.first_object = repl_chunk->object_list;
+
 	getnc_state->num_processed = i;
 
 	if (i < getnc_state->num_records) {
@@ -3245,24 +3340,14 @@ allowed:
 		}
 	}
 
-	/*
-	 * TODO:
-	 * This is just a guess, how to calculate the
-	 * number of linked attributes to send, we need to
-	 * find out how to do this right.
-	 */
-	if (r->out.ctr->ctr6.object_count >= max_links) {
-		max_links = 0;
-	} else {
-		max_links -= r->out.ctr->ctr6.object_count;
-	}
+	max_links = getncchanges_chunk_max_links(repl_chunk);
 
 	/*
 	 * Work out how many links we can send in this chunk. The default is to
 	 * send all the links last, but there is a config option to send them
 	 * immediately, in the same chunk as their source object
 	 */
-	if (!r->out.ctr->ctr6.more_data || immediate_link_sync) {
+	if (!r->out.ctr->ctr6.more_data || repl_chunk->immediate_link_sync) {
 		link_count = getnc_state->la_count - getnc_state->la_idx;
 		link_count = MIN(max_links, link_count);
 	}
@@ -3375,6 +3460,8 @@ allowed:
 		ZERO_STRUCT(r->out.ctr->ctr6.new_highwatermark);
 	}
 
+	TALLOC_FREE(repl_chunk);
+
 	DEBUG(r->out.ctr->ctr6.more_data?4:2,
 	      ("DsGetNCChanges with uSNChanged >= %llu flags 0x%08x on %s gave %u objects (done %u/%u) %u links (done %u/%u (as %s))\n",
 	       (unsigned long long)(req10->highwatermark.highest_usn+1),
-- 
2.7.4


From 49f10c32f2bfccfa200b78fa86523498bbd8d4ab Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Fri, 11 Aug 2017 16:08:15 +1200
Subject: [PATCH 25/36] getncchanges.c: Support GET_TGT better with large
 numbers of links

A source object can potentially link to thousands of target objects.
We have to be careful not to overfill the GetNCChanges response message
with more data than it's possible to send. We also don't want the client
to timeout while we're busy checking the linked attributes. The GET_TGT
support added so far is fairly dumb - this patch extends it to better
handle larger numbers of links.

To do so, this extends the repl_chunk usage so that it also works out if
the current chunk is full of links. Now as soon as the chunk is full of
either links or objects, we stop and send it back.

These changes now mean that we need to also check:
- that all the links for the last source object in the previous chunk
  have been sent, before we move on and send the next object. This only
  takes effect when immediate_link_sync is configured. It also means
  that a chunk in the middle of the replication cycle can now contain
  only links, and no objects.
- when GET_TGT is used, we only send back the links that we've verified
  the target object for. i.e. if we stop checking links because we timed
  out, we only send back the links whose targets were checked.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/rpc_server/drsuapi/getncchanges.c | 102 +++++++++++++++++++++++++++---
 1 file changed, 94 insertions(+), 8 deletions(-)

diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c
index 3abba89..4e20032 100644
--- a/source4/rpc_server/drsuapi/getncchanges.c
+++ b/source4/rpc_server/drsuapi/getncchanges.c
@@ -2351,6 +2351,32 @@ static WERROR getncchanges_get_obj_to_send(const struct ldb_message *msg,
 }
 
 /**
+ * Returns the number of links that are waiting to be sent
+ */
+static uint32_t getncchanges_chunk_links_pending(struct getncchanges_repl_chunk *repl_chunk,
+						 struct drsuapi_getncchanges_state *getnc_state)
+{
+	uint32_t links_to_send = 0;
+
+	if (getnc_state->is_get_tgt) {
+
+		/*
+		 * when the GET_TGT flag is set, only include the linked
+		 * attributes whose target object has already been checked
+		 * (i.e. they're ready to send).
+		 */
+		if (repl_chunk->tgt_la_count > getnc_state->la_idx) {
+			links_to_send = (repl_chunk->tgt_la_count -
+					 getnc_state->la_idx);
+		}
+	} else {
+		links_to_send = getnc_state->la_count - getnc_state->la_idx;
+	}
+
+	return links_to_send;
+}
+
+/**
  * Returns the max number of links that will fit in the current replication chunk
  */
 static uint32_t getncchanges_chunk_max_links(struct getncchanges_repl_chunk *repl_chunk)
@@ -2387,12 +2413,14 @@ static bool getncchanges_chunk_timed_out(struct getncchanges_repl_chunk *repl_ch
 
 /**
  * Returns true if the current chunk of replication data has reached the
- * max_objects
+ * max_objects and/or max_links thresholds.
  */
 static bool getncchanges_chunk_is_full(struct getncchanges_repl_chunk *repl_chunk,
 				       struct drsuapi_getncchanges_state *getnc_state)
 {
 	bool chunk_full = false;
+	uint32_t links_to_send;
+	uint32_t chunk_limit;
 
 	/* check if the current chunk is already full with objects */
 	if (repl_chunk->object_count >= repl_chunk->max_objects) {
@@ -2406,6 +2434,20 @@ static bool getncchanges_chunk_is_full(struct getncchanges_repl_chunk *repl_chun
 		 * and we have at least one object to send back to the client
 		 */
 		chunk_full = true;
+
+	} else if (repl_chunk->immediate_link_sync) {
+
+		/* check if the chunk is already full with links */
+		links_to_send = getncchanges_chunk_links_pending(repl_chunk,
+								 getnc_state);
+
+		chunk_limit = getncchanges_chunk_max_links(repl_chunk);
+
+		/*
+		 * The chunk is full if we've got more links to send than will
+		 * fit in one chunk
+		 */
+		chunk_full = (chunk_limit <= links_to_send);
 	}
 
 	return chunk_full;
@@ -2430,6 +2472,9 @@ static WERROR getncchanges_chunk_add_la_targets(struct getncchanges_repl_chunk *
 {
 	int ret;
 	uint32_t i;
+	uint32_t max_la_index;
+	uint32_t max_links;
+	uint32_t target_count = 0;
 	WERROR werr = WERR_OK;
 	static const char * const msg_attrs[] = {
 					    "*",
@@ -2439,8 +2484,19 @@ static WERROR getncchanges_chunk_add_la_targets(struct getncchanges_repl_chunk *
 					    DSDB_SECRET_ATTRIBUTES,
 					    NULL };
 
+	/*
+	 * A object can potentially link to thousands of targets. Only bother
+	 * checking as many targets as will fit into the current response
+	 */
+	max_links = getncchanges_chunk_max_links(repl_chunk);
+	max_la_index = MIN(getnc_state->la_count,
+			   start_la_index + max_links);
+
 	/* loop through any linked attributes to check */
-	for (i = start_la_index; i < getnc_state->la_count; i++) {
+	for (i = start_la_index;
+	     (i < max_la_index &&
+	      !getncchanges_chunk_is_full(repl_chunk, getnc_state));
+	     i++) {
 
 		struct GUID target_guid;
 		struct drsuapi_DsReplicaObjectListItemEx *new_objs = NULL;
@@ -2456,6 +2512,13 @@ static WERROR getncchanges_chunk_add_la_targets(struct getncchanges_repl_chunk *
 		la = &getnc_state->la_list[i];
 		tmp_ctx = talloc_new(mem_ctx);
 
+		/*
+		 * Track what linked attribute targets we've checked. We might
+		 * not have time to check them all, so we should only send back
+		 * the ones we've actually checked.
+		 */
+		repl_chunk->tgt_la_count = i + 1;
+
 		/* get the GUID of the linked attribute's target object */
 		schema_attrib = dsdb_attribute_by_attributeID_id(schema,
 								 la->attid);
@@ -2533,11 +2596,15 @@ static WERROR getncchanges_chunk_add_la_targets(struct getncchanges_repl_chunk *
 		}
 
 		if (new_objs != NULL) {
+			target_count++;
 			getncchanges_chunk_add_objects(repl_chunk, new_objs);
 		}
 		TALLOC_FREE(tmp_ctx);
+	}
 
-		/* TODO could have 1000s of links. Stop if we fill up the message */
+	if (target_count > 0) {
+		DEBUG(3, ("GET_TGT: checked %u link-attrs, added %u target objs\n",
+			  i - start_la_index, target_count));
 	}
 
 	return WERR_OK;
@@ -2617,7 +2684,6 @@ WERROR dcesrv_drsuapi_DsGetNCChanges(struct dcesrv_call_state *dce_call, TALLOC_
 	struct drsuapi_getncchanges_state *getnc_state;
 	struct drsuapi_DsGetNCChangesRequest10 *req10;
 	uint32_t options;
-	uint32_t max_links;
 	uint32_t link_count = 0;
 	struct ldb_dn *search_dn = NULL;
 	bool am_rodc;
@@ -3144,6 +3210,25 @@ allowed:
 				   uint32_t_ptr_cmp);
 	}
 
+	/*
+	 * Check in case we're still processing the links from an object in the
+	 * previous chunk. We want to send the links (and any targets needed)
+	 * before moving on to the next object.
+	 */
+	if (getnc_state->is_get_tgt) {
+		werr = getncchanges_chunk_add_la_targets(repl_chunk,
+							 getnc_state,
+							 getnc_state->la_idx,
+							 mem_ctx, sam_ctx,
+							 schema, &session_key,
+							 req10, local_pas,
+							 machine_dn);
+
+		if (!W_ERROR_IS_OK(werr)) {
+			return werr;
+		}
+	}
+
 	for (i=getnc_state->num_processed;
 	     i<getnc_state->num_records &&
 		     !getncchanges_chunk_is_full(repl_chunk, getnc_state);
@@ -3340,16 +3425,16 @@ allowed:
 		}
 	}
 
-	max_links = getncchanges_chunk_max_links(repl_chunk);
-
 	/*
 	 * Work out how many links we can send in this chunk. The default is to
 	 * send all the links last, but there is a config option to send them
 	 * immediately, in the same chunk as their source object
 	 */
 	if (!r->out.ctr->ctr6.more_data || repl_chunk->immediate_link_sync) {
-		link_count = getnc_state->la_count - getnc_state->la_idx;
-		link_count = MIN(max_links, link_count);
+		link_count = getncchanges_chunk_links_pending(repl_chunk,
+							      getnc_state);
+		link_count = MIN(link_count,
+				 getncchanges_chunk_max_links(repl_chunk));
 	}
 
 	/* If we've got linked attributes to send, add them now */
@@ -3480,3 +3565,4 @@ allowed:
 
 	return WERR_OK;
 }
+
-- 
2.7.4


From 1339ce30adec0db3719dce0b8ae7cdd920eb8437 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Wed, 12 Jul 2017 14:23:35 +1200
Subject: [PATCH 26/36] getncchanges.py: Add some GET_TGT test cases

test_repl_get_tgt:
- Adds 2 sets of objects
- Links one set to the other
- Changes the order so the target object comes last in the
  replication (which means the client has to use GET_TGT)
- Checks that when GET_TGT is used that we have received all target
  objects we need to resolve the linked attibutes
- Checks that we expect to receive the linked attributes *before*
  the last chunk is sent (by default, Samba sends all the links at
  the end, so this fails)
- Checks that we eventually receive all expected objects, and all
  links we receive match what is expected

test_repl_get_tgt_chain:
  This adds the linked attributes in a more complicated chain. We add
  300 objects, but the links for 100 objects will point to a linked
  chain of 200 objects.
  This was mainly to determine whether or not Windows follows the
  target object (i.e. whether it sends all the links for the target
  object as well). It turns out Windows maintains its own linked
  attribute DB, so it sends the links based on USN.

Note that vampire_dc is the only testenv that will support GET_TGT
currently (because the links need to be sent with the source object
rather than at the end of the cycle). But to use the vampire_dc, we need
to change the tests to point to DC2 instead of DC1. I've left the DC
numbering as is to match the other test cases. Instead, I've added a
test_ldb_dc handle to drs_base.py - it defaults to DC1, but tests can
override it easily and still have everything work.

Also added randomness to the test OU name to fix an intermittent
autobuild LDAP_ENTRY_ALREADY_EXISTS failure I noticed.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 selftest/knownfail.d/getncchanges          |   4 +
 source4/torture/drs/python/drs_base.py     |  26 +++-
 source4/torture/drs/python/getncchanges.py | 215 +++++++++++++++++++++++++----
 3 files changed, 216 insertions(+), 29 deletions(-)
 create mode 100644 selftest/knownfail.d/getncchanges

diff --git a/selftest/knownfail.d/getncchanges b/selftest/knownfail.d/getncchanges
new file mode 100644
index 0000000..e7d5497
--- /dev/null
+++ b/selftest/knownfail.d/getncchanges
@@ -0,0 +1,4 @@
+# GET_TGT tests currently only work for testenvs that send the links at the
+# same time as the source objects. Currently this is only the vampire_dc
+samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt\(promoted_dc\)
+samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt_chain\(promoted_dc\)
diff --git a/source4/torture/drs/python/drs_base.py b/source4/torture/drs/python/drs_base.py
index bae0b20..1eb4d57 100644
--- a/source4/torture/drs/python/drs_base.py
+++ b/source4/torture/drs/python/drs_base.py
@@ -57,6 +57,7 @@ class DrsBaseTestCase(SambaToolCmdTest):
         url_dc = samba.tests.env_get_var_value("DC2")
         (self.ldb_dc2, self.info_dc2) = samba.tests.connect_samdb_ex(url_dc,
                                                                      ldap_only=True)
+        self.test_ldb_dc = self.ldb_dc1
 
         # cache some of RootDSE props
         self.schema_dn = self.info_dc1["schemaNamingContext"][0]
@@ -74,8 +75,12 @@ class DrsBaseTestCase(SambaToolCmdTest):
     def tearDown(self):
         super(DrsBaseTestCase, self).tearDown()
 
+    def set_test_ldb_dc(self, ldb_dc):
+        """Sets which DC's LDB we perform operations on during the test"""
+        self.test_ldb_dc = ldb_dc
+
     def _GUID_string(self, guid):
-        return self.ldb_dc1.schema_format_value("objectGUID", guid)
+        return self.test_ldb_dc.schema_format_value("objectGUID", guid)
 
     def _ldap_schemaUpdateNow(self, sam_db):
         rec = {"dn": "",
@@ -218,6 +223,17 @@ class DrsBaseTestCase(SambaToolCmdTest):
 
         return ctr6_links
 
+    def _get_ctr6_object_guids(self, ctr6):
+        """Returns all the object GUIDs in a GetNCChanges response"""
+        guid_list = []
+
+        obj = ctr6.first_object
+        for i in range(0, ctr6.object_count):
+            guid_list.append(str(obj.object.identifier.guid))
+            obj = obj.next_object
+
+        return guid_list
+
     def _ctr6_debug(self, ctr6):
         """
         Displays basic info contained in a DsGetNCChanges response.
@@ -255,15 +271,15 @@ class DrsBaseTestCase(SambaToolCmdTest):
         and returns the response received from the DC.
         """
         if source_dsa is None:
-            source_dsa = self.ldb_dc1.get_ntds_GUID()
+            source_dsa = self.test_ldb_dc.get_ntds_GUID()
         if invocation_id is None:
-            invocation_id = self.ldb_dc1.get_invocation_id()
+            invocation_id = self.test_ldb_dc.get_invocation_id()
         if nc_dn_str is None:
-            nc_dn_str = self.ldb_dc1.domain_dn()
+            nc_dn_str = self.test_ldb_dc.domain_dn()
 
         if highwatermark is None:
             if self.default_hwm is None:
-                (highwatermark, _) = self._get_highest_hwm_utdv(self.ldb_dc1)
+                (highwatermark, _) = self._get_highest_hwm_utdv(self.test_ldb_dc)
             else:
                 highwatermark = self.default_hwm
 
diff --git a/source4/torture/drs/python/getncchanges.py b/source4/torture/drs/python/getncchanges.py
index 7d48133..c87523d 100644
--- a/source4/torture/drs/python/getncchanges.py
+++ b/source4/torture/drs/python/getncchanges.py
@@ -31,52 +31,64 @@ import drs_base
 import samba.tests
 import ldb
 from ldb import SCOPE_BASE
+import random
 
 from samba.dcerpc import drsuapi
 
 class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
     def setUp(self):
         super(DrsReplicaSyncIntegrityTestCase, self).setUp()
-        self.base_dn = self.ldb_dc1.get_default_basedn()
-        self.ou = "OU=uptodateness_test,%s" % self.base_dn
-        self.ldb_dc1.add({
+
+        # Note that DC2 is the DC with the testenv-specific quirks (e.g. it's
+        # the vampire_dc), so we point this test directly at that DC
+        self.set_test_ldb_dc(self.ldb_dc2)
+        (self.drs, self.drs_handle) = self._ds_bind(self.dnsname_dc2)
+
+        # add some randomness to the test OU. (Deletion of the last test's
+        # objects can be slow to replicate out. So the OU created by a previous
+        # testenv may still exist at this point).
+        rand = random.randint(1, 10000000)
+        self.base_dn = self.test_ldb_dc.get_default_basedn()
+        self.ou = "OU=getncchanges%d_test,%s" %(rand, self.base_dn)
+        self.test_ldb_dc.add({
             "dn": self.ou,
             "objectclass": "organizationalUnit"})
-        (self.drs, self.drs_handle) = self._ds_bind(self.dnsname_dc1)
-        (self.default_hwm, self.default_utdv) = self._get_highest_hwm_utdv(self.ldb_dc1)
+        (self.default_hwm, self.default_utdv) = self._get_highest_hwm_utdv(self.test_ldb_dc)
 
         self.rxd_dn_list = []
         self.rxd_links = []
+        self.rxd_guids = []
 
         # 100 is the minimum max_objects that Microsoft seems to honour
         # (the max honoured is 400ish), so we use that in these tests
         self.max_objects = 100
         self.last_ctr = None
 
-        # store whether we used GET_ANC flags in the requests
+        # store whether we used GET_TGT/GET_ANC flags in the requests
+        self.used_get_tgt = False
         self.used_get_anc = False
 
     def tearDown(self):
         super(DrsReplicaSyncIntegrityTestCase, self).tearDown()
         # tidyup groups and users
         try:
-            self.ldb_dc1.delete(self.ou, ["tree_delete:1"])
+            self.ldb_dc2.delete(self.ou, ["tree_delete:1"])
         except ldb.LdbError as (enum, string):
             if enum == ldb.ERR_NO_SUCH_OBJECT:
                 pass
 
     def add_object(self, dn):
         """Adds an OU object"""
-        self.ldb_dc1.add({"dn": dn, "objectclass": "organizationalunit"})
-        res = self.ldb_dc1.search(base=dn, scope=SCOPE_BASE)
+        self.test_ldb_dc.add({"dn": dn, "objectclass": "organizationalunit"})
+        res = self.test_ldb_dc.search(base=dn, scope=SCOPE_BASE)
         self.assertEquals(len(res), 1)
 
     def modify_object(self, dn, attr, value):
         """Modifies an object's USN by adding an attribute value to it"""
         m = ldb.Message()
-        m.dn = ldb.Dn(self.ldb_dc1, dn)
+        m.dn = ldb.Dn(self.test_ldb_dc, dn)
         m[attr] = ldb.MessageElement(value, ldb.FLAG_MOD_ADD, attr)
-        self.ldb_dc1.modify(m)
+        self.test_ldb_dc.modify(m)
 
     def create_object_range(self, start, end, prefix="",
                             children=None, parent_list=None):
@@ -149,7 +161,7 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
 
         # Create a range of objects to replicate.
         expected_dn_list = self.create_object_range(0, 400)
-        (orig_hwm, unused) = self._get_highest_hwm_utdv(self.ldb_dc1)
+        (orig_hwm, unused) = self._get_highest_hwm_utdv(self.test_ldb_dc)
 
         # We ask for the first page of 100 objects.
         # For this test, we don't care what order we receive the objects in,
@@ -160,7 +172,7 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         for x in range(100, 200):
             self.modify_object(expected_dn_list[x], "displayName", "OU%d" % x)
 
-        (post_modify_hwm, unused) = self._get_highest_hwm_utdv(self.ldb_dc1)
+        (post_modify_hwm, unused) = self._get_highest_hwm_utdv(self.test_ldb_dc)
         self.assertTrue(post_modify_hwm.highest_usn > orig_hwm.highest_usn)
 
         # Get the remaining blocks of data
@@ -190,7 +202,7 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # test object), or its parent has been seen previously
         return parent_dn == self.ou or parent_dn in known_dn_list
 
-    def _repl_send_request(self, get_anc=False):
+    def _repl_send_request(self, get_anc=False, get_tgt=False):
         """Sends a GetNCChanges request for the next block of replication data."""
 
         # we're just trying to mimic regular client behaviour here, so just
@@ -205,44 +217,56 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
 
         # Ask for the next block of replication data
         replica_flags = drsuapi.DRSUAPI_DRS_WRIT_REP
+        more_flags = 0
 
         if get_anc:
             replica_flags = drsuapi.DRSUAPI_DRS_WRIT_REP | drsuapi.DRSUAPI_DRS_GET_ANC
             self.used_get_anc = True
 
+        if get_tgt:
+            more_flags = drsuapi.DRSUAPI_DRS_GET_TGT
+            self.used_get_tgt = True
+
         # return the response from the DC
         return self._get_replication(replica_flags,
                                      max_objects=self.max_objects,
                                      highwatermark=highwatermark,
-                                     uptodateness_vector=uptodateness_vector)
+                                     uptodateness_vector=uptodateness_vector,
+                                     more_flags=more_flags)
 
-    def repl_get_next(self, get_anc=False):
+    def repl_get_next(self, get_anc=False, get_tgt=False, assert_links=False):
         """
         Requests the next block of replication data. This tries to simulate
         client behaviour - if we receive a replicated object that we don't know
         the parent of, then re-request the block with the GET_ANC flag set.
+        If we don't know the target object for a linked attribute, then
+        re-request with GET_TGT.
         """
 
         # send a request to the DC and get the response
-        ctr6 = self._repl_send_request(get_anc=get_anc)
+        ctr6 = self._repl_send_request(get_anc=get_anc, get_tgt=get_tgt)
 
-        # check that we know the parent for every object received
+        # extract the object DNs and their GUIDs from the response
         rxd_dn_list = self._get_ctr6_dn_list(ctr6)
+        rxd_guid_list = self._get_ctr6_object_guids(ctr6)
 
         # we'll add new objects as we discover them, so take a copy of the
-        # ones we already know about, so we can modify the list safely
+        # ones we already know about, so we can modify these lists safely
         known_objects = self.rxd_dn_list[:]
+        known_guids = self.rxd_guids[:]
 
         # check that we know the parent for every object received
         for i in range(0, len(rxd_dn_list)):
 
             dn = rxd_dn_list[i]
+            guid = rxd_guid_list[i]
 
             if self.is_parent_known(dn, known_objects):
 
                 # the new DN is now known so add it to the list.
                 # It may be the parent of another child in this block
                 known_objects.append(dn)
+                known_guids.append(guid)
             else:
                 # If we've already set the GET_ANC flag then it should mean
                 # we receive the parents before the child
@@ -251,14 +275,57 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
                 print("Unknown parent for %s - try GET_ANC" % dn)
 
                 # try the same thing again with the GET_ANC flag set this time
-                return self.repl_get_next(get_anc=True)
+                return self.repl_get_next(get_anc=True, get_tgt=get_tgt,
+                                          assert_links=assert_links)
+
+        # check we know about references to any objects in the linked attritbutes
+        received_links = self._get_ctr6_links(ctr6)
+
+        # This is so that older versions of Samba fail - we want the links to be
+        # sent roughly with the objects, rather than getting all links at the end
+        if assert_links:
+            self.assertTrue(len(received_links) > 0,
+                            "Links were expected in the GetNCChanges response")
+
+        for link in received_links:
+
+            # check the source object is known (Windows can actually send links
+            # where we don't know the source object yet). Samba shouldn't ever
+            # hit this case because it gets the links based on the source
+            if link.identifier not in known_guids:
+
+                # If we've already set the GET_ANC flag then it should mean
+                # this case doesn't happen
+                self.assertFalse(get_anc, "Unknown source object for GUID %s"
+                                 % link.identifier)
+
+                print("Unknown source GUID %s - try GET_ANC" % link.identifier)
+
+                # try the same thing again with the GET_ANC flag set this time
+                return self.repl_get_next(get_anc=True, get_tgt=get_tgt,
+                                          assert_links=assert_links)
+
+            # check we know the target object
+            if link.targetGUID not in known_guids:
+
+                # If we've already set the GET_TGT flag then we should have
+                # already received any objects we need to know about
+                self.assertFalse(get_tgt, "Unknown linked target for object %s"
+                                 % link.targetDN)
+
+                print("Unknown target for %s - try GET_TGT" % link.targetDN)
+
+                # try the same thing again with the GET_TGT flag set this time
+                return self.repl_get_next(get_anc=get_anc, get_tgt=True,
+                                          assert_links=assert_links)
 
         # store the last successful result so we know what HWM to request next
         self.last_ctr = ctr6
 
-        # store the objects and links we received
+        # store the objects, GUIDs, and links we received
         self.rxd_dn_list += self._get_ctr6_dn_list(ctr6)
         self.rxd_links += self._get_ctr6_links(ctr6)
+        self.rxd_guids += self._get_ctr6_object_guids(ctr6)
 
         return ctr6
 
@@ -360,8 +427,8 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # Look up the link attribute in the DB
         # The extended_dn option will dump the GUID info for the link
         # attribute (as a hex blob)
-        res = self.ldb_dc1.search(ldb.Dn(self.ldb_dc1, dn), attrs=[link_attr],
-                                  controls=['extended_dn:1:0'], scope=ldb.SCOPE_BASE)
+        res = self.test_ldb_dc.search(ldb.Dn(self.test_ldb_dc, dn), attrs=[link_attr],
+                                      controls=['extended_dn:1:0'], scope=ldb.SCOPE_BASE)
 
         # We didn't find the expected link attribute in the DB for the object.
         # Something has gone wrong somewhere...
@@ -372,7 +439,7 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # source GUIDs match what's in the DB
         for val in res[0][link_attr]:
             # Work out the expected source and target GUIDs for the DB link
-            target_dn = ldb.Dn(self.ldb_dc1, val)
+            target_dn = ldb.Dn(self.test_ldb_dc, val)
             targetGUID_blob = target_dn.get_extended_component("GUID")
             sourceGUID_blob = res[0].dn.get_extended_component("GUID")
 
@@ -390,6 +457,106 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
 
             self.assertTrue(found, "Did not receive expected link for DN %s" % dn)
 
+    def test_repl_get_tgt(self):
+        """
+        Creates a scenario where we should receive the linked attribute before
+        we know about the target object, and therefore need to use GET_TGT.
+        Note: Samba currently avoids this problem by sending all its links last
+        """
+
+        # create the test objects
+        reportees = self.create_object_range(0, 100, prefix="reportee")
+        managers = self.create_object_range(0, 100, prefix="manager")
+        all_objects = managers + reportees
+        expected_links = reportees
+
+        # add a link attribute to each reportee object that points to the
+        # corresponding manager object as the target
+        for i in range(0, 100):
+            self.modify_object(reportees[i], "managedBy", managers[i])
+
+        # touch the managers (the link-target objects) again to make sure the
+        # reportees (link source objects) get returned first by the replication
+        for i in range(0, 100):
+            self.modify_object(managers[i], "displayName", "OU%d" % i)
+
+        links_expected = True
+
+        # Get all the replication data - this code should resend the requests
+        # with GET_TGT
+        while not self.replication_complete():
+
+            # get the next block of replication data (this sets GET_TGT if needed)
+            self.repl_get_next(assert_links=links_expected)
+            links_expected = len(self.rxd_links) < len(expected_links)
+
+        # The way the test objects have been created should force
+        # self.repl_get_next() to use the GET_TGT flag. If this doesn't
+        # actually happen, then the test isn't doing its job properly
+        self.assertTrue(self.used_get_tgt,
+                        "Test didn't use the GET_TGT flag as expected")
+
+        # Check we get all the objects we're expecting
+        self.assert_expected_data(all_objects)
+
+        # Check we received links for all the reportees
+        self.assert_expected_links(expected_links)
+
+    def test_repl_get_tgt_chain(self):
+        """
+        Tests the behaviour of GET_TGT with a more complicated scenario.
+        Here we create a chain of objects linked together, so if we follow
+        the link target, then we'd traverse ~200 objects each time.
+        """
+
+        # create the test objects
+        objectsA = self.create_object_range(0, 100, prefix="AAA")
+        objectsB = self.create_object_range(0, 100, prefix="BBB")
+        objectsC = self.create_object_range(0, 100, prefix="CCC")
+
+        # create a complex set of object links:
+        #   A0-->B0-->C1-->B2-->C3-->B4-->and so on...
+        # Basically each object-A should link to a circular chain of 200 B/C
+        # objects. We create the links in separate chunks here, as it makes it
+        # clearer what happens with the USN (links on Windows have their own
+        # USN, so this approach means the A->B/B->C links aren't interleaved)
+        for i in range(0, 100):
+            self.modify_object(objectsA[i], "managedBy", objectsB[i])
+
+        for i in range(0, 100):
+            self.modify_object(objectsB[i], "managedBy", objectsC[(i + 1) % 100])
+
+        for i in range(0, 100):
+            self.modify_object(objectsC[i], "managedBy", objectsB[(i + 1) % 100])
+
+        all_objects = objectsA + objectsB + objectsC
+        expected_links = all_objects
+
+        # the default order the objects now get returned in should be:
+        # [A0-A99][B0-B99][C0-C99]
+
+        links_expected = True
+
+        # Get all the replication data - this code should resend the requests
+        # with GET_TGT
+        while not self.replication_complete():
+
+            # get the next block of replication data (this sets GET_TGT if needed)
+            self.repl_get_next(assert_links=links_expected)
+            links_expected = len(self.rxd_links) < len(expected_links)
+
+        # The way the test objects have been created should force
+        # self.repl_get_next() to use the GET_TGT flag. If this doesn't
+        # actually happen, then the test isn't doing its job properly
+        self.assertTrue(self.used_get_tgt,
+                        "Test didn't use the GET_TGT flag as expected")
+
+        # Check we get all the objects we're expecting
+        self.assert_expected_data(all_objects)
+
+        # Check we received links for all the reportees
+        self.assert_expected_links(expected_links)
+
     def test_repl_get_anc_link_attr(self):
         """
         A basic GET_ANC test where the parents have linked attributes
-- 
2.7.4


From 93814ada585879a2f245e963a7b89d3ba3619365 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Tue, 13 Jun 2017 12:14:45 +1200
Subject: [PATCH 27/36] getncchanges.py: Add test for adding links during
 replication

We have identified a case where the Samba server can send linked
attributes but not the target object. In this case, the Samba DRS client
would hit the "Failed to re-resolve GUID" case in replmd and silently
discard the linked attribute.

However, Samba will resend the linked attribute in the next cycle
(because its USN is still higher than the committed HWM), so it should
recover OK. On older releases, this may have caused problems if the
first error resulting in a hanging link (which might mean the second
time it's processed it still fails to be added).

Note that this test currently passes on all Samba testenvs, because we
are manually setting the GET_TGT flag (even though the testenv might
still send all the linked attributes at the end of the cycle), so the
server is still fetching the target objects.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/torture/drs/python/getncchanges.py | 47 ++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/source4/torture/drs/python/getncchanges.py b/source4/torture/drs/python/getncchanges.py
index c87523d..65d3748 100644
--- a/source4/torture/drs/python/getncchanges.py
+++ b/source4/torture/drs/python/getncchanges.py
@@ -557,6 +557,53 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # Check we received links for all the reportees
         self.assert_expected_links(expected_links)
 
+    def test_repl_integrity_link_attr(self):
+        """
+        Tests adding links to new objects while a replication is in progress.
+        """
+
+        # create some source objects for the linked attributes, sandwiched
+        # between 2 blocks of filler objects
+        filler = self.create_object_range(0, 100, prefix="filler")
+        reportees = self.create_object_range(0, 100, prefix="reportee")
+        filler += self.create_object_range(100, 200, prefix="filler")
+
+        # Start the replication and get the first block of filler objects
+        # (We're being mean here and setting the GET_TGT flag right from the
+        # start. On earlier Samba versions, if the client encountered an
+        # unknown target object and retried with GET_TGT, it would restart the
+        # replication cycle from scratch, which avoids the problem).
+        self.repl_get_next(get_tgt=True)
+
+        # create the target objects and add the links. These objects should be
+        # outside the scope of the Samba replication cycle, but the links should
+        # still get sent with the source object
+        managers = self.create_object_range(0, 100, prefix="manager")
+
+        for i in range(0, 100):
+            self.modify_object(reportees[i], "managedBy", managers[i])
+
+        expected_objects = managers + reportees + filler
+        expected_links = reportees
+
+        # complete the replication
+        while not self.replication_complete():
+            self.repl_get_next(get_tgt=True)
+
+        # If we didn't receive the most recently created objects in the last
+        # replication cycle, then kick off another replication to get them
+        if len(self.rxd_dn_list) < len(expected_objects):
+            self.repl_get_next()
+
+            while not self.replication_complete():
+                self.repl_get_next()
+
+        # Check we get all the objects we're expecting
+        self.assert_expected_data(expected_objects)
+
+        # Check we received links for all the parents
+        self.assert_expected_links(expected_links)
+
     def test_repl_get_anc_link_attr(self):
         """
         A basic GET_ANC test where the parents have linked attributes
-- 
2.7.4


From a2a4b5fed5497cb545712159be0edd1b4c83f71b Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Thu, 13 Jul 2017 11:47:16 +1200
Subject: [PATCH 28/36] getncchanges.py: Add test for GET_ANC and GET_TGT
 combined

The code has to handle needing GET_ANC and GET_TGT in combination, i.e.
where we fetch the target object for the linked attribute and the target
object's parent is unknown as well. This patch adds a test case to
exercise this code path.

The second part of this test exercises GET_ANC/GET_TGT for an
incremental replication, where the objects are getting filtered by an
uptodateness-vector/HWM.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 selftest/knownfail.d/getncchanges          |   1 +
 source4/torture/drs/python/drs_base.py     |   4 +-
 source4/torture/drs/python/getncchanges.py | 138 ++++++++++++++++++++++++++++-
 3 files changed, 139 insertions(+), 4 deletions(-)

diff --git a/selftest/knownfail.d/getncchanges b/selftest/knownfail.d/getncchanges
index e7d5497..64e2c42 100644
--- a/selftest/knownfail.d/getncchanges
+++ b/selftest/knownfail.d/getncchanges
@@ -2,3 +2,4 @@
 # same time as the source objects. Currently this is only the vampire_dc
 samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt\(promoted_dc\)
 samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt_chain\(promoted_dc\)
+samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt_and_anc\(promoted_dc\)
diff --git a/source4/torture/drs/python/drs_base.py b/source4/torture/drs/python/drs_base.py
index 1eb4d57..ace8bdd 100644
--- a/source4/torture/drs/python/drs_base.py
+++ b/source4/torture/drs/python/drs_base.py
@@ -246,7 +246,7 @@ class DrsBaseTestCase(SambaToolCmdTest):
 
             next_object = ctr6.first_object
             for i in range(0, ctr6.object_count):
-                print("Obj %d: %s %s" %(i, next_object.object.identifier.dn[:22],
+                print("Obj %d: %s %s" %(i, next_object.object.identifier.dn[:25],
                                         next_object.object.identifier.guid))
                 next_object = next_object.next_object
 
@@ -254,7 +254,7 @@ class DrsBaseTestCase(SambaToolCmdTest):
             ctr6_links = self._get_ctr6_links(ctr6)
             for link in ctr6_links:
                 print("Link Tgt %s... <-- Src %s"
-                      %(link.targetDN[:22], link.identifier))
+                      %(link.targetDN[:25], link.identifier))
 
             print("HWM:     %d" %(ctr6.new_highwatermark.highest_usn))
             print("Tmp HWM: %d" %(ctr6.new_highwatermark.tmp_highest_usn))
diff --git a/source4/torture/drs/python/getncchanges.py b/source4/torture/drs/python/getncchanges.py
index 65d3748..9aadfb9 100644
--- a/source4/torture/drs/python/getncchanges.py
+++ b/source4/torture/drs/python/getncchanges.py
@@ -90,6 +90,25 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         m[attr] = ldb.MessageElement(value, ldb.FLAG_MOD_ADD, attr)
         self.test_ldb_dc.modify(m)
 
+    def delete_attribute(self, dn, attr, value):
+        """Deletes an attribute from an object"""
+        m = ldb.Message()
+        m.dn = ldb.Dn(self.ldb_dc2, dn)
+        m[attr] = ldb.MessageElement(value, ldb.FLAG_MOD_DELETE, attr)
+        self.ldb_dc2.modify(m)
+
+    def start_new_repl_cycle(self):
+        """Resets enough state info to start a new replication cycle"""
+        # reset rxd_links, but leave rxd_guids and rxd_dn_list alone so we know
+        # whether a parent/target is unknown and needs GET_ANC/GET_TGT to resolve
+        self.rxd_links = []
+
+        self.used_get_tgt = False
+        self.used_get_anc = False
+        # mostly preserve self.last_ctr, so that we use the last HWM
+        if self.last_ctr is not None:
+            self.last_ctr.more_data = True
+
     def create_object_range(self, start, end, prefix="",
                             children=None, parent_list=None):
         """
@@ -402,14 +421,16 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # Check we get all the objects we're expecting
         self.assert_expected_data(expected_dn_list)
 
-    def assert_expected_links(self, objects_with_links, link_attr="managedBy"):
+    def assert_expected_links(self, objects_with_links, link_attr="managedBy",
+                              num_expected=None):
         """
         Asserts that a GetNCChanges response contains any expected links
         for the objects it contains.
         """
         received_links = self.rxd_links
 
-        num_expected = len(objects_with_links)
+        if num_expected is None:
+            num_expected = len(objects_with_links)
 
         self.assertTrue(len(received_links) == num_expected,
                         "Received %d links but expected %d"
@@ -640,3 +661,116 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # Check we received links for all the parents
         self.assert_expected_links(parent_dn_list)
 
+    def test_repl_get_tgt_and_anc(self):
+        """
+        Check we can resolve an unknown ancestor when fetching the link target,
+        i.e. tests using GET_TGT and GET_ANC in combination
+        """
+
+        # Create some parent/child objects (the child will be the link target)
+        parents = []
+        all_objects = self.create_object_range(0, 100, prefix="parent",
+                                               children=["la_tgt"],
+                                               parent_list=parents)
+
+        children = [item for item in all_objects if item not in parents]
+
+        # create the link source objects and link them to the child/target
+        la_sources = self.create_object_range(0, 100, prefix="la_src")
+        all_objects += la_sources
+
+        for i in range(0, 100):
+            self.modify_object(la_sources[i], "managedBy", children[i])
+
+        expected_links = la_sources
+
+        # modify the children/targets so they come after the link source
+        for x in range(0, 100):
+            self.modify_object(children[x], "displayName", "OU%d" % x)
+
+        # modify the parents, so they now come last in the replication
+        for x in range(0, 100):
+            self.modify_object(parents[x], "displayName", "OU%d" % x)
+
+        # We've now got objects in the following order:
+        # [100 la_source][100 la_target][100 parents (of la_target)]
+
+        links_expected = True
+
+        # Get all the replication data - this code should resend the requests
+        # with GET_TGT and GET_ANC
+        while not self.replication_complete():
+
+            # get the next block of replication data (this sets GET_TGT/GET_ANC)
+            self.repl_get_next(assert_links=links_expected)
+            links_expected = len(self.rxd_links) < len(expected_links)
+
+        # The way the test objects have been created should force
+        # self.repl_get_next() to use the GET_TGT/GET_ANC flags. If this
+        # doesn't actually happen, then the test isn't doing its job properly
+        self.assertTrue(self.used_get_tgt,
+                        "Test didn't use the GET_TGT flag as expected")
+        self.assertTrue(self.used_get_anc,
+                        "Test didn't use the GET_ANC flag as expected")
+
+        # Check we get all the objects we're expecting
+        self.assert_expected_data(all_objects)
+
+        # Check we received links for all the link sources
+        self.assert_expected_links(expected_links)
+
+        # Second part of test. Add some extra objects and kick off another
+        # replication. The test code will use the HWM from the last replication
+        # so we'll only receive the objects we modify below
+        self.start_new_repl_cycle()
+
+        # add an extra level of grandchildren that hang off a child
+        # that got created last time
+        new_parent = "OU=test_new_parent,%s" % children[0]
+        self.add_object(new_parent)
+        new_children = []
+
+        for x in range(0, 50):
+            dn = "OU=test_new_la_tgt%d,%s" % (x, new_parent)
+            self.add_object(dn)
+            new_children.append(dn)
+
+        # replace half of the links to point to the new children
+        for x in range(0, 50):
+            self.delete_attribute(la_sources[x], "managedBy", children[x])
+            self.modify_object(la_sources[x], "managedBy", new_children[x])
+
+        # add some filler objects to fill up the 1st chunk
+        filler = self.create_object_range(0, 100, prefix="filler")
+
+        # modify the new children/targets so they come after the link source
+        for x in range(0, 50):
+            self.modify_object(new_children[x], "displayName", "OU-%d" % x)
+
+        # modify the parent, so it now comes last in the replication
+        self.modify_object(new_parent, "displayName", "OU%d" % x)
+
+        # We should now get the modified objects in the following order:
+        # [50 links (x 2)][100 filler][50 new children][new parent]
+        # Note that the link sources aren't actually sent (their new linked
+        # attributes are sent, but apart from that, nothing has changed)
+        all_objects = filler + new_children + [new_parent]
+        expected_links = la_sources[:50]
+
+        links_expected = True
+
+        while not self.replication_complete():
+            self.repl_get_next(assert_links=links_expected)
+            links_expected = len(self.rxd_links) < len(expected_links)
+
+        self.assertTrue(self.used_get_tgt,
+                        "Test didn't use the GET_TGT flag as expected")
+        self.assertTrue(self.used_get_anc,
+                        "Test didn't use the GET_ANC flag as expected")
+
+        # Check we get all the objects we're expecting
+        self.assert_expected_data(all_objects)
+
+        # Check we received links (50 deleted links and 50 new)
+        self.assert_expected_links(expected_links, num_expected=100)
+
-- 
2.7.4


From 83931b438de8ca3b9acc6c7b1ba0cc0290828119 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Mon, 17 Jul 2017 14:04:38 +1200
Subject: [PATCH 29/36] getnc_exop.py: Extend EXOP_REPL_OBJ test case to use
 GET_TGT

We already check that when we use GET_ANC that we still only receive a
single object when EXOP_REPL_OBJ is used. This extends the test to also
check that only a single object is returned when GET_TGT is used.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/torture/drs/python/getnc_exop.py | 67 ++++++++++++++++++++------------
 1 file changed, 43 insertions(+), 24 deletions(-)

diff --git a/source4/torture/drs/python/getnc_exop.py b/source4/torture/drs/python/getnc_exop.py
index 2caa50c..56353ad 100644
--- a/source4/torture/drs/python/getnc_exop.py
+++ b/source4/torture/drs/python/getnc_exop.py
@@ -137,8 +137,8 @@ class DrsReplicaSyncTestCase(drs_base.DrsBaseTestCase):
 
     def test_do_single_repl(self):
         """
-        Make sure that DRSU_EXOP_REPL_OBJ never replicates more than
-        one object, even when we use DRS_GET_ANC.
+	Make sure that DRSUAPI_EXOP_REPL_OBJ never replicates more than
+        one object, even when we use DRS_GET_ANC/GET_TGT.
         """
 
         ou1 = "OU=get_anc1,%s" % self.ou
@@ -161,36 +161,55 @@ class DrsReplicaSyncTestCase(drs_base.DrsBaseTestCase):
             })
         dc3_id = self._get_identifier(self.ldb_dc1, dc3)
 
-        req8 = self._exop_req8(dest_dsa=None,
-                               invocation_id=self.ldb_dc1.get_invocation_id(),
-                               nc_dn_str=ou1,
-                               exop=drsuapi.DRSUAPI_EXOP_REPL_OBJ,
-                               replica_flags=drsuapi.DRSUAPI_DRS_WRIT_REP)
-        (level, ctr) = self.drs.DsGetNCChanges(self.drs_handle, 8, req8)
-        self._check_ctr6(ctr, [ou1])
+        # Add some linked attributes (for checking GET_TGT behaviour)
+        m = ldb.Message()
+        m.dn = ldb.Dn(self.ldb_dc2, ou1)
+        m["managedBy"] = ldb.MessageElement(ou2, ldb.FLAG_MOD_ADD, "managedBy")
+        self.ldb_dc1.modify(m)
+        ou1_link = AbstractLink(drsuapi.DRSUAPI_ATTID_managedBy,
+                                drsuapi.DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE,
+                                ou1_id.guid, ou2_id.guid)
+
+        m.dn = ldb.Dn(self.ldb_dc2, dc3)
+        m["managedBy"] = ldb.MessageElement(ou2, ldb.FLAG_MOD_ADD, "managedBy")
+        self.ldb_dc1.modify(m)
+        dc3_link = AbstractLink(drsuapi.DRSUAPI_ATTID_managedBy,
+                                drsuapi.DRSUAPI_DS_LINKED_ATTRIBUTE_FLAG_ACTIVE,
+                                dc3_id.guid, ou2_id.guid)
+
+        req = self._getnc_req10(dest_dsa=None,
+                                invocation_id=self.ldb_dc1.get_invocation_id(),
+                                nc_dn_str=ou1,
+                                exop=drsuapi.DRSUAPI_EXOP_REPL_OBJ,
+                                replica_flags=drsuapi.DRSUAPI_DRS_WRIT_REP,
+                                more_flags=drsuapi.DRSUAPI_DRS_GET_TGT)
+        (level, ctr) = self.drs.DsGetNCChanges(self.drs_handle, 10, req)
+        self._check_ctr6(ctr, [ou1], expected_links=[ou1_link])
 
         # DRSUAPI_DRS_WRIT_REP means that we should only replicate the dn we give (dc3).
         # DRSUAPI_DRS_GET_ANC means that we should also replicate its ancestors, but
         # Windows doesn't do this if we use both.
-        req8 = self._exop_req8(dest_dsa=None,
-                               invocation_id=self.ldb_dc1.get_invocation_id(),
-                               nc_dn_str=dc3,
-                               exop=drsuapi.DRSUAPI_EXOP_REPL_OBJ,
-                               replica_flags=drsuapi.DRSUAPI_DRS_WRIT_REP |
-                                             drsuapi.DRSUAPI_DRS_GET_ANC)
-        (level, ctr) = self.drs.DsGetNCChanges(self.drs_handle, 8, req8)
-        self._check_ctr6(ctr, [dc3])
+        req = self._getnc_req10(dest_dsa=None,
+                                invocation_id=self.ldb_dc1.get_invocation_id(),
+                                nc_dn_str=dc3,
+                                exop=drsuapi.DRSUAPI_EXOP_REPL_OBJ,
+                                replica_flags=drsuapi.DRSUAPI_DRS_WRIT_REP |
+                                              drsuapi.DRSUAPI_DRS_GET_ANC,
+                                more_flags=drsuapi.DRSUAPI_DRS_GET_TGT)
+        (level, ctr) = self.drs.DsGetNCChanges(self.drs_handle, 10, req)
+        self._check_ctr6(ctr, [dc3], expected_links=[dc3_link])
 
         # Even though the ancestor of ou2 (ou1) has changed since last hwm, and we're
         # sending DRSUAPI_DRS_GET_ANC, the expected response is that it will only try
         # and replicate the single object still.
-        req8 = self._exop_req8(dest_dsa=None,
-                               invocation_id=self.ldb_dc1.get_invocation_id(),
-                               nc_dn_str=ou2,
-                               exop=drsuapi.DRSUAPI_EXOP_REPL_OBJ,
-                               replica_flags=drsuapi.DRSUAPI_DRS_CRITICAL_ONLY |
-                                             drsuapi.DRSUAPI_DRS_GET_ANC)
-        (level, ctr) = self.drs.DsGetNCChanges(self.drs_handle, 8, req8)
+        req = self._getnc_req10(dest_dsa=None,
+                                invocation_id=self.ldb_dc1.get_invocation_id(),
+                                nc_dn_str=ou2,
+                                exop=drsuapi.DRSUAPI_EXOP_REPL_OBJ,
+                                replica_flags=drsuapi.DRSUAPI_DRS_CRITICAL_ONLY |
+                                              drsuapi.DRSUAPI_DRS_GET_ANC,
+                                more_flags=drsuapi.DRSUAPI_DRS_GET_TGT)
+        (level, ctr) = self.drs.DsGetNCChanges(self.drs_handle, 10, req)
         self._check_ctr6(ctr, [ou2])
 
     def test_link_utdv_hwm(self):
-- 
2.7.4


From 0741639aab4a8b9046cac93ab57b997c580c7276 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Wed, 19 Jul 2017 11:38:55 +1200
Subject: [PATCH 30/36] getncchanges.py: Add tests for object deletion during
 replication

Add tests that delete the source and target objects for linked
attributes in the middle of a replication cycle.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/torture/drs/python/getncchanges.py | 60 ++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/source4/torture/drs/python/getncchanges.py b/source4/torture/drs/python/getncchanges.py
index 9aadfb9..442868b 100644
--- a/source4/torture/drs/python/getncchanges.py
+++ b/source4/torture/drs/python/getncchanges.py
@@ -774,3 +774,63 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # Check we received links (50 deleted links and 50 new)
         self.assert_expected_links(expected_links, num_expected=100)
 
+    def _repl_integrity_obj_deletion(self, delete_link_source=True):
+        """
+        Tests deleting link objects while a replication is in progress.
+        """
+
+        # create some objects and link them together, with some filler
+        # object in between the link sources
+        la_sources = self.create_object_range(0, 100, prefix="la_source")
+        la_targets = self.create_object_range(0, 100, prefix="la_targets")
+
+        for i in range(0, 50):
+            self.modify_object(la_sources[i], "managedBy", la_targets[i])
+
+        filler = self.create_object_range(0, 100, prefix="filler")
+
+        for i in range(50, 100):
+            self.modify_object(la_sources[i], "managedBy", la_targets[i])
+
+        # touch the targets so that the sources get replicated first
+        for i in range(0, 100):
+            self.modify_object(la_targets[i], "displayName", "OU%d" % i)
+
+        # objects should now be in the following USN order:
+        # [50 la_source][100 filler][50 la_source][100 la_target]
+
+        # Get the first block containing 50 link sources
+        self.repl_get_next()
+
+        # delete either the link targets or link source objects
+        if delete_link_source:
+            objects_to_delete = la_sources
+            # in GET_TGT testenvs we only receive the first 50 source objects
+            expected_objects = la_sources[:50] + la_targets + filler
+        else:
+            objects_to_delete = la_targets
+            expected_objects = la_sources + filler
+
+        for obj in objects_to_delete:
+            self.ldb_dc2.delete(obj)
+
+        # complete the replication
+        while not self.replication_complete():
+            self.repl_get_next()
+
+        # Check we get all the objects we're expecting
+        self.assert_expected_data(expected_objects)
+
+        # we can't use assert_expected_links() here because it tries to check
+        # against the deleted objects on the DC. (Although we receive some
+        # links from the first block processed, the Samba client should end up
+        # deleting these, as the source/target object involved is deleted)
+        self.assertTrue(len(self.rxd_links) == 50,
+                        "Expected 50 links, not %d" % len(self.rxd_links))
+
+    def test_repl_integrity_src_obj_deletion(self):
+        self._repl_integrity_obj_deletion(delete_link_source=True)
+
+    def test_repl_integrity_tgt_obj_deletion(self):
+        self._repl_integrity_obj_deletion(delete_link_source=False)
+
-- 
2.7.4


From 17274235bc0e233c098b38ee869933ab14499f73 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Thu, 20 Jul 2017 17:06:14 +1200
Subject: [PATCH 31/36] getncchanges.py: Add test for replicating reanimated
 objects

Reading between the lines, this scenario seems to be the main reason
that Microsoft added the GET_TGT flag. MS AD can handle getting links
for unknown targets OK, but if it receives links for a deleted/recycled
target then it would tend to drop the received links. Samba client also
used to drop the links if talking to a Microsoft DC (or a Samba server
with GET_TGT support).

The specific scenario is the client side already knows about a deleted
object. That object is then re-animated and used as the target for a
linked attribute. *Then* the target object gets updated again so it gets
sent in a later replication chunk to the linked attribute, i.e. the
client receives the link before it learns that the target object has
been re-animated.

In this test we're interested in particular at how the client behaves
when it receives a linked attribute for a deleted object. (It *should*
retry with GET_TGT to make sure the target is up-to-date. However, it
was just dropping the linked attribute).

To exercise the client-side, we disable replication, setup the
links/objects on one DC the way we want them, then force a replication
to the second DC. We then check that when we query each DC, they both
tell us about the links/objects we're expecting (i.e. no links got
lost).

Note that this wasn't a problem with older versions of Samba-to-Samba
because sending the links last guaranteed that the target objects were
always up-to-date.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/torture/drs/python/getncchanges.py | 124 ++++++++++++++++++++++++++---
 1 file changed, 114 insertions(+), 10 deletions(-)

diff --git a/source4/torture/drs/python/getncchanges.py b/source4/torture/drs/python/getncchanges.py
index 442868b..9cfe0fe 100644
--- a/source4/torture/drs/python/getncchanges.py
+++ b/source4/torture/drs/python/getncchanges.py
@@ -55,28 +55,31 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
             "objectclass": "organizationalUnit"})
         (self.default_hwm, self.default_utdv) = self._get_highest_hwm_utdv(self.test_ldb_dc)
 
+        self.init_test_state()
+
+    def tearDown(self):
+        super(DrsReplicaSyncIntegrityTestCase, self).tearDown()
+        # tidyup groups and users
+        try:
+            self.ldb_dc2.delete(self.ou, ["tree_delete:1"])
+        except ldb.LdbError as (enum, string):
+            if enum == ldb.ERR_NO_SUCH_OBJECT:
+                pass
+
+    def init_test_state(self):
         self.rxd_dn_list = []
         self.rxd_links = []
         self.rxd_guids = []
+        self.last_ctr = None
 
         # 100 is the minimum max_objects that Microsoft seems to honour
         # (the max honoured is 400ish), so we use that in these tests
         self.max_objects = 100
-        self.last_ctr = None
 
         # store whether we used GET_TGT/GET_ANC flags in the requests
         self.used_get_tgt = False
         self.used_get_anc = False
 
-    def tearDown(self):
-        super(DrsReplicaSyncIntegrityTestCase, self).tearDown()
-        # tidyup groups and users
-        try:
-            self.ldb_dc2.delete(self.ou, ["tree_delete:1"])
-        except ldb.LdbError as (enum, string):
-            if enum == ldb.ERR_NO_SUCH_OBJECT:
-                pass
-
     def add_object(self, dn):
         """Adds an OU object"""
         self.test_ldb_dc.add({"dn": dn, "objectclass": "organizationalunit"})
@@ -834,3 +837,104 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
     def test_repl_integrity_tgt_obj_deletion(self):
         self._repl_integrity_obj_deletion(delete_link_source=False)
 
+    def restore_deleted_object(self, guid, new_dn):
+        """Re-animates a deleted object"""
+
+        res = self.test_ldb_dc.search(base="<GUID=%s>" % self._GUID_string(guid), attrs=["isDeleted"],
+                                  controls=['show_deleted:1'], scope=ldb.SCOPE_BASE)
+        if len(res) != 1:
+            return
+
+        msg = ldb.Message()
+        msg.dn = res[0].dn
+        msg["isDeleted"] = ldb.MessageElement([], ldb.FLAG_MOD_DELETE, "isDeleted")
+        msg["distinguishedName"] = ldb.MessageElement([new_dn], ldb.FLAG_MOD_REPLACE, "distinguishedName")
+        self.test_ldb_dc.modify(msg, ["show_deleted:1"])
+
+    def sync_DCs(self):
+        # make sure DC1 has all the changes we've made to DC2
+        self._net_drs_replicate(DC=self.dnsname_dc1, fromDC=self.dnsname_dc2)
+
+    def get_object_guid(self, dn):
+        res = self.test_ldb_dc.search(base=dn, attrs=["objectGUID"], scope=ldb.SCOPE_BASE)
+        return res[0]['objectGUID'][0]
+
+    def test_repl_integrity_obj_reanimation(self):
+        """
+        Checks receiving links for a re-animated object doesn't lose links.
+        We test this against the peer DC to make sure it doesn't drop links.
+        """
+
+        # This test is a little different in that we're particularly interested
+        # in exercising the replmd client code on the second DC.
+        # First, make sure the peer DC has the base OU, then store its HWM
+        self.sync_DCs()
+        (peer_drs, peer_drs_handle) = self._ds_bind(self.dnsname_dc1)
+        (peer_default_hwm, peer_default_utdv) = self._get_highest_hwm_utdv(self.ldb_dc1)
+
+        # create the link source/target objects
+        la_sources = self.create_object_range(0, 100, prefix="la_src")
+        la_targets = self.create_object_range(0, 100, prefix="la_tgt")
+
+        # store the target object's GUIDs (we need to know these to reanimate them)
+        target_guids = []
+
+        for dn in la_targets:
+            target_guids.append(self.get_object_guid(dn))
+
+        # delete the link target
+        for x in range(0, 100):
+            self.ldb_dc2.delete(la_targets[x])
+
+        # sync the DCs, then disable replication. We want the peer DC to get
+        # all the following changes in a single replication cycle
+        self.sync_DCs()
+        self._disable_all_repl(self.dnsname_dc2)
+
+        # restore the target objects for the linked attributes again
+        for x in range(0, 100):
+            self.restore_deleted_object(target_guids[x], la_targets[x])
+
+        # add the links
+        for x in range(0, 100):
+            self.modify_object(la_sources[x], "managedBy", la_targets[x])
+
+        # create some additional filler objects
+        filler = self.create_object_range(0, 100, prefix="filler")
+
+        # modify the targets so they now come last
+        for x in range(0, 100):
+            self.modify_object(la_targets[x], "displayName", "OU-%d" % x)
+
+        # the objects should now be sent in the following order:
+        # [la sources + links][filler][la targets]
+        all_objects = la_sources + la_targets + filler
+        expected_links = la_sources
+
+        # Enable replication again and make sure the 2 DCs are back in sync
+        self._enable_all_repl(self.dnsname_dc2)
+        self.sync_DCs()
+
+        # get the replication data from the test DC - we should get 100 links
+        while not self.replication_complete():
+            self.repl_get_next()
+
+        # Check we get all the objects and links we're expecting
+        self.assert_expected_data(all_objects)
+        self.assert_expected_links(expected_links)
+
+        # switch over the DC state info so we now talk to the peer DC
+        self.init_test_state()
+        self.default_hwm = peer_default_hwm
+        self.default_utdv = peer_default_utdv
+        self.drs = peer_drs
+        self.drs_handle = peer_drs_handle
+        self.set_test_ldb_dc(self.ldb_dc1)
+
+        # check that we get the same information from the 2nd DC
+        while not self.replication_complete():
+            self.repl_get_next()
+
+        self.assert_expected_data(all_objects)
+        self.assert_expected_links(expected_links)
+
-- 
2.7.4


From e89c21465060dfdf65e93e8ed03f05e75c644f73 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Mon, 24 Jul 2017 14:43:54 +1200
Subject: [PATCH 32/36] getncchanges.py: Add a test for dropped cross-partition
 links

Samba would drop linked attributes that span partitions if it didn't
know about the target object. This patch adds a test that exposes the
problem.

I've re-used the code from the previous re-animation test to do this.
I've also added a very basic DcConnection helper class that basically
stores the connection state information the drs_base.py uses for
replication. This allows us to switch the DC we want to replicate from
easily. This approach could potentially be retro-fitted to some of the
existing test cases, as it allows us to test both the DRS client code
and server code at the same time.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/torture/drs/python/getncchanges.py | 159 +++++++++++++++++++++++------
 1 file changed, 129 insertions(+), 30 deletions(-)

diff --git a/source4/torture/drs/python/getncchanges.py b/source4/torture/drs/python/getncchanges.py
index 9cfe0fe..4a5cc4a 100644
--- a/source4/torture/drs/python/getncchanges.py
+++ b/source4/torture/drs/python/getncchanges.py
@@ -41,8 +41,8 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
 
         # Note that DC2 is the DC with the testenv-specific quirks (e.g. it's
         # the vampire_dc), so we point this test directly at that DC
-        self.set_test_ldb_dc(self.ldb_dc2)
-        (self.drs, self.drs_handle) = self._ds_bind(self.dnsname_dc2)
+        self.default_conn = DcConnection(self, self.ldb_dc2, self.dnsname_dc2)
+        self.set_dc_connection(self.default_conn)
 
         # add some randomness to the test OU. (Deletion of the last test's
         # objects can be slow to replicate out. So the OU created by a previous
@@ -80,9 +80,9 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         self.used_get_tgt = False
         self.used_get_anc = False
 
-    def add_object(self, dn):
+    def add_object(self, dn, objectclass="organizationalunit"):
         """Adds an OU object"""
-        self.test_ldb_dc.add({"dn": dn, "objectclass": "organizationalunit"})
+        self.test_ldb_dc.add({"dn": dn, "objectclass": objectclass})
         res = self.test_ldb_dc.search(base=dn, scope=SCOPE_BASE)
         self.assertEquals(len(res), 1)
 
@@ -96,9 +96,9 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
     def delete_attribute(self, dn, attr, value):
         """Deletes an attribute from an object"""
         m = ldb.Message()
-        m.dn = ldb.Dn(self.ldb_dc2, dn)
+        m.dn = ldb.Dn(self.test_ldb_dc, dn)
         m[attr] = ldb.MessageElement(value, ldb.FLAG_MOD_DELETE, attr)
-        self.ldb_dc2.modify(m)
+        self.test_ldb_dc.modify(m)
 
     def start_new_repl_cycle(self):
         """Resets enough state info to start a new replication cycle"""
@@ -311,6 +311,10 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
 
         for link in received_links:
 
+            # skip any links that aren't part of the test
+            if self.ou not in link.targetDN:
+                continue
+
             # check the source object is known (Windows can actually send links
             # where we don't know the source object yet). Samba shouldn't ever
             # hit this case because it gets the links based on the source
@@ -851,14 +855,58 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         msg["distinguishedName"] = ldb.MessageElement([new_dn], ldb.FLAG_MOD_REPLACE, "distinguishedName")
         self.test_ldb_dc.modify(msg, ["show_deleted:1"])
 
-    def sync_DCs(self):
+    def sync_DCs(self, nc_dn=None):
         # make sure DC1 has all the changes we've made to DC2
-        self._net_drs_replicate(DC=self.dnsname_dc1, fromDC=self.dnsname_dc2)
+        self._net_drs_replicate(DC=self.dnsname_dc1, fromDC=self.dnsname_dc2, nc_dn=nc_dn)
 
     def get_object_guid(self, dn):
         res = self.test_ldb_dc.search(base=dn, attrs=["objectGUID"], scope=ldb.SCOPE_BASE)
         return res[0]['objectGUID'][0]
 
+
+    def set_dc_connection(self, conn):
+        """
+        Switches over the connection state info that the underlying drs_base
+        class uses so that we replicate with a different DC.
+        """
+        self.default_hwm = conn.default_hwm
+        self.default_utdv = conn.default_utdv
+        self.drs = conn.drs
+        self.drs_handle = conn.drs_handle
+        self.set_test_ldb_dc(conn.ldb_dc)
+
+    def assert_DCs_replication_is_consistent(self, peer_conn, all_objects,
+                                             expected_links):
+        """
+        Replicates against both the primary and secondary DCs in the testenv
+        and checks that both return the expected results.
+        """
+        print("Checking replication against primary test DC...")
+
+        # get the replication data from the test DC first
+        while not self.replication_complete():
+            self.repl_get_next()
+
+        # Check we get all the objects and links we're expecting
+        self.assert_expected_data(all_objects)
+        self.assert_expected_links(expected_links)
+
+        # switch over the DC state info so we now talk to the peer DC
+        self.set_dc_connection(peer_conn)
+        self.init_test_state()
+
+        print("Checking replication against secondary test DC...")
+
+        # check that we get the same information from the 2nd DC
+        while not self.replication_complete():
+            self.repl_get_next()
+
+        self.assert_expected_data(all_objects)
+        self.assert_expected_links(expected_links)
+
+        # switch back to using the default connection
+        self.set_dc_connection(self.default_conn)
+
     def test_repl_integrity_obj_reanimation(self):
         """
         Checks receiving links for a re-animated object doesn't lose links.
@@ -867,10 +915,10 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
 
         # This test is a little different in that we're particularly interested
         # in exercising the replmd client code on the second DC.
-        # First, make sure the peer DC has the base OU, then store its HWM
+        # First, make sure the peer DC has the base OU, then connect to it (so
+        # we store its inital HWM)
         self.sync_DCs()
-        (peer_drs, peer_drs_handle) = self._ds_bind(self.dnsname_dc1)
-        (peer_default_hwm, peer_default_utdv) = self._get_highest_hwm_utdv(self.ldb_dc1)
+        peer_conn = DcConnection(self, self.ldb_dc1, self.dnsname_dc1)
 
         # create the link source/target objects
         la_sources = self.create_object_range(0, 100, prefix="la_src")
@@ -911,30 +959,81 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         all_objects = la_sources + la_targets + filler
         expected_links = la_sources
 
-        # Enable replication again and make sure the 2 DCs are back in sync
+        # Enable replication again make sure the 2 DCs are back in sync
         self._enable_all_repl(self.dnsname_dc2)
         self.sync_DCs()
 
-        # get the replication data from the test DC - we should get 100 links
-        while not self.replication_complete():
-            self.repl_get_next()
+        # Get the replication data from each DC in turn.
+        # Check that both give us all the objects and links we're expecting,
+        # i.e. no links were lost
+        self.assert_DCs_replication_is_consistent(peer_conn, all_objects,
+                                                  expected_links)
 
-        # Check we get all the objects and links we're expecting
-        self.assert_expected_data(all_objects)
-        self.assert_expected_links(expected_links)
+    def test_repl_integrity_cross_partition_links(self):
+        """
+        Checks that a cross-partition link to an unknown target object does
+        not result in missing links.
+        """
 
-        # switch over the DC state info so we now talk to the peer DC
-        self.init_test_state()
-        self.default_hwm = peer_default_hwm
-        self.default_utdv = peer_default_utdv
-        self.drs = peer_drs
-        self.drs_handle = peer_drs_handle
-        self.set_test_ldb_dc(self.ldb_dc1)
+        # check the peer DC is up-to-date, then connect (storing its HWM)
+        self.sync_DCs()
+        peer_conn = DcConnection(self, self.ldb_dc1, self.dnsname_dc1)
 
-        # check that we get the same information from the 2nd DC
-        while not self.replication_complete():
-            self.repl_get_next()
+        # stop replication so the peer gets the following objects in one go
+        self._disable_all_repl(self.dnsname_dc2)
+
+        # create a link source object in the main NC
+        la_source = "OU=cross_nc_src,%s" % self.ou
+        self.add_object(la_source)
+
+        # create the link target (a server object) in the config NC
+        rand = random.randint(1, 10000000)
+        la_target = "CN=getncchanges-%d,CN=Servers,CN=Default-First-Site-Name," \
+                    "CN=Sites,%s" %(rand, self.config_dn)
+        self.add_object(la_target, objectclass="server")
+
+        # add a cross-partition link between the two
+        self.modify_object(la_source, "managedBy", la_target)
+
+        # sync the NC containing the link source first, then the target/Config NC
+        self.sync_DCs()
+        self.sync_DCs(nc_dn=self.config_dn)
+        self._enable_all_repl(self.dnsname_dc2)
+
+        # Get the replication data from each DC in turn.
+        # Check that both return the cross-partition link (note we're not
+        # checking the config domain NC here for simplicity)
+        self.assert_DCs_replication_is_consistent(peer_conn,
+                                                  all_objects=[la_source],
+                                                  expected_links=[la_source])
+
+        # the cross-partition linked attribute has a missing backlink. Check
+        # that we can still delete it successfully
+        self.delete_attribute(la_source, "managedBy", la_target)
+        self.sync_DCs()
+
+        res = self.test_ldb_dc.search(ldb.Dn(self.ldb_dc1, la_source),
+                                      attrs=["managedBy"],
+                                      controls=['extended_dn:1:0'],
+                                      scope=ldb.SCOPE_BASE)
+        self.assertFalse("managedBy" in res[0], "%s in DB still has managedBy attribute"
+                         % la_source)
+        res = self.test_ldb_dc.search(ldb.Dn(self.ldb_dc2, la_source),
+                                      attrs=["managedBy"],
+                                      controls=['extended_dn:1:0'],
+                                      scope=ldb.SCOPE_BASE)
+        self.assertFalse("managedBy" in res[0], "%s in DB still has managedBy attribute"
+                         % la_source)
+
+        # cleanup the server object we created in the Configuration partition
+        self.test_ldb_dc.delete(la_source)
+
+class DcConnection:
+    """Helper class to track a connection to another DC"""
+
+    def __init__(self, drs_base, ldb_dc, dnsname_dc):
+        self.ldb_dc = ldb_dc
+        (self.drs, self.drs_handle) = drs_base._ds_bind(dnsname_dc)
+        (self.default_hwm, self.default_utdv) = drs_base._get_highest_hwm_utdv(ldb_dc)
 
-        self.assert_expected_data(all_objects)
-        self.assert_expected_links(expected_links)
 
-- 
2.7.4


From b876c76a5f6b1c289ff560032c963b25682f1305 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Tue, 15 Aug 2017 12:18:02 +1200
Subject: [PATCH 33/36] getncchanges.py: Add a multi-valued linked attribute
 test

Add a test where a source object links to multiple different targets.
First we do the replication without GET_TGT and check that the server
can handle sending a chunk containing only links (in the middle of the
replication). Then we repeat the replication forcing GET_TGT to be used.

To avoid having to create 1500 objects/links, I've lowered the 'max
link sync' setting on the vampire_dc testenv to 250.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 selftest/knownfail.d/getncchanges          |  1 +
 selftest/target/Samba4.pm                  |  3 +-
 source4/torture/drs/python/getncchanges.py | 53 ++++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/selftest/knownfail.d/getncchanges b/selftest/knownfail.d/getncchanges
index 64e2c42..5ef1bc9 100644
--- a/selftest/knownfail.d/getncchanges
+++ b/selftest/knownfail.d/getncchanges
@@ -3,3 +3,4 @@
 samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt\(promoted_dc\)
 samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt_chain\(promoted_dc\)
 samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt_and_anc\(promoted_dc\)
+samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt_multivalued_links\(promoted_dc\)
diff --git a/selftest/target/Samba4.pm b/selftest/target/Samba4.pm
index 778b6fa..ef0ea07 100755
--- a/selftest/target/Samba4.pm
+++ b/selftest/target/Samba4.pm
@@ -1289,7 +1289,8 @@ sub provision_vampire_dc($$$)
 	if ($fl == "2000") {
 		$name = "vampire2000dc";
 	} else {
-		$extra_conf = "drs: immediate link sync = yes";
+		$extra_conf = "drs: immediate link sync = yes
+                       drs: max link sync = 250";
 	}
 
 	# We do this so that we don't run the provision.  That's the job of 'net vampire'.
diff --git a/source4/torture/drs/python/getncchanges.py b/source4/torture/drs/python/getncchanges.py
index 4a5cc4a..e17d2bd 100644
--- a/source4/torture/drs/python/getncchanges.py
+++ b/source4/torture/drs/python/getncchanges.py
@@ -1028,6 +1028,59 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         # cleanup the server object we created in the Configuration partition
         self.test_ldb_dc.delete(la_source)
 
+    def test_repl_get_tgt_multivalued_links(self):
+        """Tests replication with multi-valued link attributes."""
+
+        # create the target/source objects and link them together
+        la_targets = self.create_object_range(0, 500, prefix="la_tgt")
+        la_source = "CN=la_src,%s" % self.ou
+        self.add_object(la_source, objectclass="msExchConfigurationContainer")
+
+        for tgt in la_targets:
+            self.modify_object(la_source, "addressBookRoots2", tgt)
+
+        filler = self.create_object_range(0, 100, prefix="filler")
+
+        # We should receive the objects/links in the following order:
+        # [500 targets + 1 source][500 links][100 filler]
+        expected_objects = la_targets + [la_source] + filler
+        link_only_chunk = False
+
+        # First do the replication without needing GET_TGT
+        while not self.replication_complete():
+            ctr6 = self.repl_get_next()
+
+            if ctr6.object_count == 0 and ctr6.linked_attributes_count != 0:
+                link_only_chunk = True
+
+        # we should receive one chunk that contains only links
+        self.assertTrue(link_only_chunk,
+                        "Expected to receive a chunk containing only links")
+
+        # check we received all the expected objects/links
+        self.assert_expected_data(expected_objects)
+        self.assert_expected_links([la_source], link_attr="addressBookRoots2", num_expected=500)
+
+        # Do the replication again, forcing the use of GET_TGT this time
+        self.init_test_state()
+
+        for x in range(0, 500):
+            self.modify_object(la_targets[x], "displayName", "OU-%d" % x)
+
+        # The objects/links should get sent in the following order:
+        # [1 source][500 targets][500 links][100 filler]
+
+        while not self.replication_complete():
+            ctr6 = self.repl_get_next()
+
+        self.assertTrue(self.used_get_tgt,
+                        "Test didn't use the GET_TGT flag as expected")
+
+        # check we received all the expected objects/links
+        self.assert_expected_data(expected_objects)
+        self.assert_expected_links([la_source], link_attr="addressBookRoots2", num_expected=500)
+
+
 class DcConnection:
     """Helper class to track a connection to another DC"""
 
-- 
2.7.4


From 65e0f5bfc19f9a6babdc808bb8d1fc596cd07df7 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Fri, 11 Aug 2017 13:53:31 +1200
Subject: [PATCH 34/36] replmd: Allow missing targets if GET_TGT has already
 been set

While running the selftests, I noticed a case where DC replication
unexpectedly sends a linked attribute for a deleted object (created in
the drs.ridalloc_exop tests). The problem is due to the
msDS-NC-Replica-Locations attribute, which is a (known) one-way link.
Because it is a one-way link, when the test demotes the DC and deletes
the link target, there is no backlink to delete the link from the source
object.

After much debate and head-scratching, we decided that there wasn't an
ideal way to resolve this problem. Any automated intervention could
potentially do the wrong thing, especially if the link spans partitions.
Running dbcheck will find this problem and is able to fix it (providing
the deleted object is still a tombstone). So the recommendation is to
run dbcheck on your DCs every 6 months (or more frequently if using a
lower tombstone lifetime setting).

However, it does highlight a problem with the current GET_TGT
implementation. If the tombstone object had been expunged and you
upgraded to 4.8, then you would be stuck - replication would fail
because the target object can't be resolved, even with GET_TGT, and
dbcheck would not be able to fix the hanging link. The solution is to
not fail the replication for an unknown target if GET_TGT has already
been set (i.e. the dsdb_repl_flags contains
DSDB_REPL_FLAG_TARGETS_UPTODATE).

It's debatable whether we should add a hanging link in this case or
ignore/drop the link. Some cases to consider:
- If you're talking to a DC that still sends all the links last, you
  could still get object deletion between processing the source object's
  links and sending the target (GET_TGT just restarts the replication
  cycle from scratch). Adding a hanging link in this case would be
  incorrect and would add spurious information to the DB.
- Suppose there's a bug in Samba that incorrectly results in an object
  disappearing. If other DCs then remove any links that pointed to that
  object, it makes recovering from the problem harder. However, simply
  ignoring the link shouldn't result in data loss, i.e. replication won't
  remove the existing link information from other DCs. Data loss in this
  case would only occur if a new DC were brought online, or if it were a
  new link that was affected.
Based on this, I think ignoring the link does the least harm.

I also noticed a logic flaw in the deleted object case, where we would
unnecessarily retry with GET_TGT if the deleted target object was in
another partition. This is pointless work, because GET_TGT won't resolve
the target.

We can fix both cases by using the same logic in both the unknown target
and the deleted target cases. I've combined the logic and moved it into
a common replmd_allow_missing_target() function.

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 151 +++++++++++++++---------
 source4/torture/drs/python/getncchanges.py      |  24 +++-
 2 files changed, 116 insertions(+), 59 deletions(-)

diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index 1396f84..9c90901 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -6665,6 +6665,84 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 }
 
 /**
+ * Checks how to handle an missing target - either we need to fail the
+ * replication and retry with GET_TGT, ignore the link and continue, or try to
+ * add a partial link to an unknown target.
+ */
+static int replmd_allow_missing_target(struct ldb_module *module,
+				       TALLOC_CTX *mem_ctx,
+				       struct ldb_dn *target_dn,
+				       struct ldb_dn *source_dn,
+				       struct GUID *guid,
+				       uint32_t dsdb_repl_flags,
+				       bool *ignore_link,
+				       const char * missing_str)
+{
+	struct ldb_context *ldb = ldb_module_get_ctx(module);
+	int ret = LDB_SUCCESS;
+
+	/*
+	 * we may not be able to resolve link targets properly when
+	 * dealing with subsets of objects, e.g. the source is a
+	 * critical object and the target isn't
+	 *
+	 * TODO:
+	 * When we implement Trusted Domains we need to consider
+	 * whether they get treated as an incomplete replica here or not
+	 */
+	if (dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET) {
+
+		/*
+		 * Ignore the link. We don't increase the highwater-mark in
+		 * the object subset cases, so subsequent replications should
+		 * resolve any missing links
+		 */
+		DEBUG(2, ("%s target %s linked from %s\n", missing_str,
+			  ldb_dn_get_linearized(target_dn),
+			  ldb_dn_get_linearized(source_dn)));
+		*ignore_link = true;
+
+	} else if (dsdb_repl_flags & DSDB_REPL_FLAG_TARGETS_UPTODATE) {
+
+		/*
+		 * target should already be up-to-date so there's no point in
+		 * retrying. This could be due to bad timing, or if a target
+		 * on a one-way link was deleted. We ignore the link rather
+		 * than failing the replication cycle completely
+		 */
+		*ignore_link = true;
+		DBG_WARNING("%s is %s but up to date. Ignoring link from %s\n",
+			    ldb_dn_get_linearized(target_dn), missing_str,
+			    ldb_dn_get_linearized(source_dn));
+
+	} else if (dsdb_objects_have_same_nc(ldb, mem_ctx, source_dn,
+					     target_dn)) {
+
+		/* fail the replication and retry with GET_TGT */
+		ldb_asprintf_errstring(ldb, "%s target %s GUID %s linked from %s\n",
+				       missing_str,
+				       ldb_dn_get_linearized(target_dn),
+				       GUID_string(mem_ctx, guid),
+				       ldb_dn_get_linearized(source_dn));
+		ret = LDB_ERR_NO_SUCH_OBJECT;
+	} else {
+
+		/*
+		 * The target of the cross-partition link is missing. Continue
+		 * and try to at least add the forward-link. This isn't great,
+		 * but a partial link can be fixed by dbcheck, so it's better
+		 * than dropping the link completely.
+		 */
+		DBG_WARNING("%s cross-partition target %s linked from %s\n",
+			    missing_str, ldb_dn_get_linearized(target_dn),
+			    ldb_dn_get_linearized(source_dn));
+		*ignore_link = false;
+	}
+
+	return ret;
+}
+
+/**
  * Checks that the target object for a linked attribute exists.
  * @param guid returns the target object's GUID (is returned)if it exists)
  * @param ignore_link set to true if the linked attribute should be ignored
@@ -6746,46 +6824,14 @@ static int replmd_check_target_exists(struct ldb_module *module,
 	if (target_res->count == 0) {
 
 		/*
-		 * we may not be able to resolve link targets properly when
-		 * dealing with subsets of objects, e.g. the source is a
-		 * critical object and the target isn't
-		 *
-		 * TODO:
-		 * When we implement Trusted Domains we need to consider
-		 * whether they get treated as an incomplete replica here or not
+		 * target object is unknown. Check whether to ignore the link,
+		 * fail the replication, or add a partial link
 		 */
-		if (la_entry->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET) {
-
-			/*
-			 * We don't increase the highwater-mark in the object
-			 * subset cases, so subsequent replications should
-			 * resolve any missing links
-			 */
-			DEBUG(2,(__location__
-				 ": Failed to find target %s linked from %s\n",
-				 ldb_dn_get_linearized(dsdb_dn->dn),
-				 ldb_dn_get_linearized(source_dn)));
-			*ignore_link = true;
-
-		} else if (dsdb_objects_have_same_nc(ldb, tmp_ctx, source_dn,
-						     dsdb_dn->dn)) {
-			ldb_asprintf_errstring(ldb, "Unknown target %s GUID %s linked from %s\n",
-					       ldb_dn_get_linearized(dsdb_dn->dn),
-					       GUID_string(tmp_ctx, guid),
-					       ldb_dn_get_linearized(source_dn));
-			ret = LDB_ERR_NO_SUCH_OBJECT;
-		} else {
+		ret = replmd_allow_missing_target(module, tmp_ctx, dsdb_dn->dn,
+						  source_dn, guid,
+						  la_entry->dsdb_repl_flags,
+						  ignore_link, "Unknown");
 
-			/*
-			 * The target of the cross-partition link is missing.
-			 * Continue and try to at least add the forward-link.
-			 * This isn't great, but if we can add a partial link
-			 * then it's better than nothing.
-			 */
-			DEBUG(2,("Failed to resolve cross-partition link between %s and %s\n",
-				 ldb_dn_get_linearized(source_dn),
-				 ldb_dn_get_linearized(dsdb_dn->dn)));
-		}
 	} else if (target_res->count != 1) {
 		ldb_asprintf_errstring(ldb, "More than one object found matching objectGUID %s\n",
 				       GUID_string(tmp_ctx, guid));
@@ -6808,26 +6854,15 @@ static int replmd_check_target_exists(struct ldb_module *module,
 		 */
 		if (target_deletion_state >= OBJECT_RECYCLED) {
 
-			if (la_entry->dsdb_repl_flags & DSDB_REPL_FLAG_TARGETS_UPTODATE) {
-
-				/*
-				 * target should already be uptodate so there's no
-				 * point retrying - it's probably just bad timing
-				 */
-				*ignore_link = true;
-				DEBUG(0, ("%s is deleted but up to date. "
-					  "Ignoring link from %s\n",
-					  ldb_dn_get_linearized(dsdb_dn->dn),
-					  ldb_dn_get_linearized(source_dn)));
-
-			} else {
-				ldb_asprintf_errstring(ldb,
-						       "Deleted target %s GUID %s linked from %s",
-						       ldb_dn_get_linearized(dsdb_dn->dn),
-						       GUID_string(tmp_ctx, guid),
-						       ldb_dn_get_linearized(source_dn));
-				ret = LDB_ERR_NO_SUCH_OBJECT;
-			}
+			/*
+			 * target object is deleted. Check whether to ignore the
+			 * link, fail the replication, or add a partial link
+			 */
+			ret = replmd_allow_missing_target(module, tmp_ctx,
+							  dsdb_dn->dn,
+							  source_dn, guid,
+							  la_entry->dsdb_repl_flags,
+							  ignore_link, "Deleted");
 		}
 	}
 
diff --git a/source4/torture/drs/python/getncchanges.py b/source4/torture/drs/python/getncchanges.py
index e17d2bd..1177789 100644
--- a/source4/torture/drs/python/getncchanges.py
+++ b/source4/torture/drs/python/getncchanges.py
@@ -1025,8 +1025,30 @@ class DrsReplicaSyncIntegrityTestCase(drs_base.DrsBaseTestCase):
         self.assertFalse("managedBy" in res[0], "%s in DB still has managedBy attribute"
                          % la_source)
 
+        # Check receiving a cross-partition link to a deleted target.
+        # Delete the target and make sure the deletion is sync'd between DCs
+        target_guid = self.get_object_guid(la_target)
+        self.test_ldb_dc.delete(la_target)
+        self.sync_DCs(nc_dn=self.config_dn)        
+        self._disable_all_repl(self.dnsname_dc2)
+
+        # re-animate the target
+        self.restore_deleted_object(target_guid, la_target)
+        self.modify_object(la_source, "managedBy", la_target)
+
+        # now sync the link - because the target is in another partition, the
+        # peer DC receives a link for a deleted target, which it should accept
+        self.sync_DCs()
+        res = self.test_ldb_dc.search(ldb.Dn(self.ldb_dc1, la_source),
+                                      attrs=["managedBy"],
+                                      controls=['extended_dn:1:0'],
+                                      scope=ldb.SCOPE_BASE)
+        self.assertTrue("managedBy" in res[0], "%s in DB missing managedBy attribute"
+                        % la_source)
+
         # cleanup the server object we created in the Configuration partition
-        self.test_ldb_dc.delete(la_source)
+        self.test_ldb_dc.delete(la_target)
+        self._enable_all_repl(self.dnsname_dc2)
 
     def test_repl_get_tgt_multivalued_links(self):
         """Tests replication with multi-valued link attributes."""
-- 
2.7.4


From 7e35a2d8a4e9fb3d6f7ba6b6fd3eaad197eb58b9 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Fri, 11 Aug 2017 15:39:35 +1200
Subject: [PATCH 35/36] replmd: Avoid duplicated debug/warnings

We display warnings if a target object is missing but it's still OK to
continue the replication. Currently we need to check the target twice -
once to verify it when we first receive it, and once when we actually
commit it (we can't skip the 2nd check altogether because in the join
case, they could occur quite far apart).

One annoying side-effect is we get the same warning message coming out
twice in these special cases.

In the cases where we're checking the dsdb_repl_flags, we can actually
just bypass the verification checks for the target object (if it doesn't
exist we still continue anyway). This may save us a tiny bit of
unnecessary work.

For cross-partition links, we can limit logging these warnings to when
the objects are actually being committed. This avoids spurious warnings
in the join case (i.e. we receive the link before we receive the target
object's partition, but we have received all partitions by the time we
actually commit the objects).

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 33 ++++++++++++++++++-------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index 9c90901..ca4acb1 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -6673,6 +6673,7 @@ static int replmd_allow_missing_target(struct ldb_module *module,
 				       TALLOC_CTX *mem_ctx,
 				       struct ldb_dn *target_dn,
 				       struct ldb_dn *source_dn,
+				       bool is_obj_commit,
 				       struct GUID *guid,
 				       uint32_t dsdb_repl_flags,
 				       bool *ignore_link,
@@ -6681,6 +6682,8 @@ static int replmd_allow_missing_target(struct ldb_module *module,
 	struct ldb_context *ldb = ldb_module_get_ctx(module);
 	int ret = LDB_SUCCESS;
 
+	*ignore_link = false;
+
 	/*
 	 * we may not be able to resolve link targets properly when
 	 * dealing with subsets of objects, e.g. the source is a
@@ -6725,18 +6728,19 @@ static int replmd_allow_missing_target(struct ldb_module *module,
 				       GUID_string(mem_ctx, guid),
 				       ldb_dn_get_linearized(source_dn));
 		ret = LDB_ERR_NO_SUCH_OBJECT;
-	} else {
+
+	} else if (is_obj_commit) {
 
 		/*
 		 * The target of the cross-partition link is missing. Continue
 		 * and try to at least add the forward-link. This isn't great,
 		 * but a partial link can be fixed by dbcheck, so it's better
-		 * than dropping the link completely.
+		 * than dropping the link completely. (Only log this when we're
+		 * actually committing the objects, to avoid spurious logs)
 		 */
 		DBG_WARNING("%s cross-partition target %s linked from %s\n",
 			    missing_str, ldb_dn_get_linearized(target_dn),
 			    ldb_dn_get_linearized(source_dn));
-		*ignore_link = false;
 	}
 
 	return ret;
@@ -6752,6 +6756,7 @@ static int replmd_check_target_exists(struct ldb_module *module,
 				      struct dsdb_dn *dsdb_dn,
 				      struct la_entry *la_entry,
 				      struct ldb_dn *source_dn,
+				      bool is_obj_commit,
 				      struct GUID *guid,
 				      bool *ignore_link)
 {
@@ -6828,7 +6833,7 @@ static int replmd_check_target_exists(struct ldb_module *module,
 		 * fail the replication, or add a partial link
 		 */
 		ret = replmd_allow_missing_target(module, tmp_ctx, dsdb_dn->dn,
-						  source_dn, guid,
+						  source_dn, is_obj_commit, guid,
 						  la_entry->dsdb_repl_flags,
 						  ignore_link, "Unknown");
 
@@ -6859,8 +6864,8 @@ static int replmd_check_target_exists(struct ldb_module *module,
 			 * link, fail the replication, or add a partial link
 			 */
 			ret = replmd_allow_missing_target(module, tmp_ctx,
-							  dsdb_dn->dn,
-							  source_dn, guid,
+							  dsdb_dn->dn, source_dn,
+							  is_obj_commit, guid,
 							  la_entry->dsdb_repl_flags,
 							  ignore_link, "Deleted");
 		}
@@ -7018,8 +7023,18 @@ static int replmd_verify_linked_attribute(struct replmd_replicated_request *ar,
 		return ret;
 	}
 
-	ret = replmd_check_target_exists(module, tgt_dsdb_dn, la,
-					 src_msg->dn, &guid, &dummy);
+	/*
+	 * We can skip the target object checks if we're only syncing critical
+	 * objects, or we know the target is up-to-date. If either case, we
+	 * still continue even if the target doesn't exist
+	 */
+	if ((la->dsdb_repl_flags & (DSDB_REPL_FLAG_OBJECT_SUBSET |
+				    DSDB_REPL_FLAG_TARGETS_UPTODATE)) == 0) {
+
+		ret = replmd_check_target_exists(module, tgt_dsdb_dn, la,
+						 src_msg->dn, false, &guid,
+						 &dummy);
+	}
 
 	/*
 	 * When we fail to find the target object, the error code we pass
@@ -7107,7 +7122,7 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	}
 
 	ret = replmd_check_target_exists(module, dsdb_dn, la_entry, msg->dn,
-					 &guid, &ignore_link);
+					 true, &guid, &ignore_link);
 
 	if (ret != LDB_SUCCESS) {
 		talloc_free(tmp_ctx);
-- 
2.7.4


From 4a0c5d2daa8e9faf0f153617aa701590bfdcc1c7 Mon Sep 17 00:00:00 2001
From: Tim Beale <timbeale at catalyst.net.nz>
Date: Tue, 15 Aug 2017 16:15:14 +1200
Subject: [PATCH 36/36] getncchanges.c: max_links calculation didn't work well
 in some cases

The max_links calculation didn't work particularly well if max_links was
set to a value lower than max_objects.

As soon as repl_chunk->object_count exceeded repl_chunk->max_links, the
chunk would be deemed full, even if there was only one link to send (or
even worse, no links to send). For example, if max_objects=100 and
max_links=10, then it would send back chunks of 10 objects (or 9 objects
and 1 link).

I believe the historic reason this logic exists is to avoid overfilling
the response message. It's hard to tell what the appropriate limit would
be because the total message size would depend on how many attributes
each object has.

I couldn't think of logic that would be suitable for all cases. I toyed
with the idea of working out a percentage of how full the message is.
However, adjusting the max_links doesn't really make sense when the
settings are small enough, e.g. max_objects=100 and max_links=100 is
never going to overfill the message, so there's no reason to alter the
values.

In the end I went with:
- If the user is using non-default values, just use those.
- In the default value case, just use the historic calculation

Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
---
 source4/rpc_server/drsuapi/getncchanges.c | 42 ++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c
index 4e20032..750b18f 100644
--- a/source4/rpc_server/drsuapi/getncchanges.c
+++ b/source4/rpc_server/drsuapi/getncchanges.c
@@ -43,6 +43,8 @@
 #include "librpc/gen_ndr/ndr_misc.h"
 
 #define DRS_GUID_SIZE       16
+#define DEFAULT_MAX_OBJECTS 1000
+#define DEFAULT_MAX_LINKS   1500
 
 /*
  * state of a partially-completed replication cycle. This state persists
@@ -2381,21 +2383,27 @@ static uint32_t getncchanges_chunk_links_pending(struct getncchanges_repl_chunk
  */
 static uint32_t getncchanges_chunk_max_links(struct getncchanges_repl_chunk *repl_chunk)
 {
-	uint32_t max_links;
+	uint32_t max_links = 0;
 
-	/*
-	 * This is just an approximate guess to avoid overfilling the replication
-	 * chunk. E.g. if we've already sent 1000 objects, then send 1000 fewer
-	 * links. For comparison, the max that Windows seems to send is ~2700
-	 * links and ~250 objects (although this may vary based on timeouts)
-	 */
-	if (repl_chunk->object_count >= repl_chunk->max_links) {
+	if (repl_chunk->max_links != DEFAULT_MAX_LINKS ||
+	    repl_chunk->max_objects != DEFAULT_MAX_OBJECTS) {
 
-		/* request is already full of objects - don't send any links */
-		max_links = 0;
-	} else {
+		/*
+		 * We're using non-default settings, so don't try to adjust
+		 * them, just trust the user has configured decent values
+		 */
+		max_links = repl_chunk->max_links;
+
+	} else if (repl_chunk->max_links > repl_chunk->object_count) {
 
-		/* send fewer links if we're already sending a lot of objects */
+		/*
+		 * This is just an approximate guess to avoid overfilling the
+		 * replication chunk. It's the logic we've used historically.
+		 * E.g. if we've already sent 1000 objects, then send 1000 fewer
+		 * links. For comparison, the max that Windows seems to send is
+		 * ~2700 links and ~250 objects (although this may vary based
+		 * on timeouts)
+		 */
 		max_links = repl_chunk->max_links - repl_chunk->object_count;
 	}
 
@@ -2447,7 +2455,9 @@ static bool getncchanges_chunk_is_full(struct getncchanges_repl_chunk *repl_chun
 		 * The chunk is full if we've got more links to send than will
 		 * fit in one chunk
 		 */
-		chunk_full = (chunk_limit <= links_to_send);
+		if (links_to_send > 0 && chunk_limit <= links_to_send) {
+			chunk_full = true;
+		}
 	}
 
 	return chunk_full;
@@ -2625,7 +2635,8 @@ static struct getncchanges_repl_chunk * getncchanges_chunk_new(TALLOC_CTX *mem_c
 	repl_chunk->start = time(NULL);
 
 	repl_chunk->max_objects = lpcfg_parm_int(dce_call->conn->dce_ctx->lp_ctx, NULL,
-						 "drs", "max object sync", 1000);
+						 "drs", "max object sync",
+						 DEFAULT_MAX_OBJECTS);
 
 	/*
 	 * The client control here only applies in normal replication, not extended
@@ -2645,7 +2656,8 @@ static struct getncchanges_repl_chunk * getncchanges_chunk_new(TALLOC_CTX *mem_c
 
 	repl_chunk->max_links =
 			lpcfg_parm_int(dce_call->conn->dce_ctx->lp_ctx, NULL,
-				       "drs", "max link sync", 1500);
+				       "drs", "max link sync",
+					DEFAULT_MAX_LINKS);
 
 	repl_chunk->immediate_link_sync =
 			lpcfg_parm_bool(dce_call->conn->dce_ctx->lp_ctx, NULL,
-- 
2.7.4



More information about the samba-technical mailing list