[SCM] Samba Shared Repository - branch master updated

Mon Sep 18 07:57:04 UTC 2017

The branch, master has been updated
       via  1541c50 selftest: Add some tests for linked attribute conflicts
       via  46c1f7b getncchanges.c: max_links calculation didn't work well in some cases
       via  3c0d80d replmd: Avoid duplicated debug/warnings
       via  44ca841 replmd: Allow missing targets if GET_TGT has already been set
       via  278039f getncchanges.c: Support GET_TGT better with large numbers of links
       via  7ba1084 getncchanges.c: Refactor to track more state using repl_chunk
       via  10df9f6 getncchanges.py: Add a multi-valued linked attribute test
       via  693e3ad getncchanges.py: Add a test for dropped cross-partition links
       via  9697190 getncchanges.py: Add test for replicating reanimated objects
       via  ed2fc52 drs: Add basic GET_TGT support
       via  821094d getncchanges.py: Add tests for object deletion during replication
       via  469aed0 getnc_exop.py: Extend EXOP_REPL_OBJ test case to use GET_TGT
       via  00b20c8 getncchanges.py: Add test for GET_ANC and GET_TGT combined
       via  6ec9ef2 getncchanges.py: Add test for adding links during replication
       via  af82bde getncchanges.py: Add some GET_TGT test cases
       via  172eedc getnc_exop.py: Fix GET_TGT behaviour in DRS tests
      from  af38d73 s4/smbd: set the process group.

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 1541c50b370e8695bf5c958cc41073d1552a3c52
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Wed Aug 23 12:45:09 2017 +1200

    selftest: Add some tests for linked attribute conflicts
    
    Currently we have tests that check we can resolve object conflicts, but
    these don't test anything related to conflicting linked attributes.
    This patch adds some basic tests that checks that Samba can resolve
    conflicting linked attributes.
    
    This highlights some problems with Samba, as the following tests
    currently fail:
    - test_conflict_single_valued_link: Samba currently can't resolve a
      conflicting targets for a single-valued linked attribute - the
      replication exits with an error.
    - test_link_deletion_conflict: If 2 DCs add the same linked attribute,
      currently when they resolve this conflict the RMD_VERSION for the
      linked attribute incorrectly gets incremented. This means the version
      numbers get out of step and subsequent changes to the linked attribute
      can be dropped/ignored.
    - test_full_sync_link_conflict: fails for the same reason as above.
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>
    
    Autobuild-User(master): Garming Sam <garming at samba.org>
    Autobuild-Date(master): Mon Sep 18 09:56:41 CEST 2017 on sn-devel-144

commit 46c1f7bdeee7330ffdf43edada032205d594567f
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Tue Aug 15 16:15:14 2017 +1200

    getncchanges.c: max_links calculation didn't work well in some cases
    
    The max_links calculation didn't work particularly well if max_links was
    set to a value lower than max_objects.
    
    As soon as repl_chunk->object_count exceeded repl_chunk->max_links, the
    chunk would be deemed full, even if there was only one link to send (or
    even worse, no links to send). For example, if max_objects=100 and
    max_links=10, then it would send back chunks of 10 objects (or 9 objects
    and 1 link).
    
    I believe the historic reason this logic exists is to avoid overfilling
    the response message. It's hard to tell what the appropriate limit would
    be because the total message size would depend on how many attributes
    each object has.
    
    I couldn't think of logic that would be suitable for all cases. I toyed
    with the idea of working out a percentage of how full the message is.
    However, adjusting the max_links doesn't really make sense when the
    settings are small enough, e.g. max_objects=100 and max_links=100 is
    never going to overfill the message, so there's no reason to alter the
    values.
    
    In the end I went with:
    - If the user is using non-default values, just use those.
    - In the default value case, just use the historic calculation
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=12972
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit 3c0d80d0c36f5c65efa3f8e81c880b6d8c26d30f
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Fri Aug 11 15:39:35 2017 +1200

    replmd: Avoid duplicated debug/warnings
    
    We display warnings if a target object is missing but it's still OK to
    continue the replication. Currently we need to check the target twice -
    once to verify it when we first receive it, and once when we actually
    commit it (we can't skip the 2nd check altogether because in the join
    case, they could occur quite far apart).
    
    One annoying side-effect is we get the same warning message coming out
    twice in these special cases.
    
    In the cases where we're checking the dsdb_repl_flags, we can actually
    just bypass the verification checks for the target object (if it doesn't
    exist we still continue anyway). This may save us a tiny bit of
    unnecessary work.
    
    For cross-partition links, we can limit logging these warnings to when
    the objects are actually being committed. This avoids spurious warnings
    in the join case (i.e. we receive the link before we receive the target
    object's partition, but we have received all partitions by the time we
    actually commit the objects).
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=12972
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit 44ca84166e5a909f1acd77323a444ff2e14a8db8
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Fri Aug 11 13:53:31 2017 +1200

    replmd: Allow missing targets if GET_TGT has already been set
    
    While running the selftests, I noticed a case where DC replication
    unexpectedly sends a linked attribute for a deleted object (created in
    the drs.ridalloc_exop tests). The problem is due to the
    msDS-NC-Replica-Locations attribute, which is a (known) one-way link.
    Because it is a one-way link, when the test demotes the DC and deletes
    the link target, there is no backlink to delete the link from the source
    object.
    
    After much debate and head-scratching, we decided that there wasn't an
    ideal way to resolve this problem. Any automated intervention could
    potentially do the wrong thing, especially if the link spans partitions.
    Running dbcheck will find this problem and is able to fix it (providing
    the deleted object is still a tombstone). So the recommendation is to
    run dbcheck on your DCs every 6 months (or more frequently if using a
    lower tombstone lifetime setting).
    
    However, it does highlight a problem with the current GET_TGT
    implementation. If the tombstone object had been expunged and you
    upgraded to 4.8, then you would be stuck - replication would fail
    because the target object can't be resolved, even with GET_TGT, and
    dbcheck would not be able to fix the hanging link. The solution is to
    not fail the replication for an unknown target if GET_TGT has already
    been set (i.e. the dsdb_repl_flags contains
    DSDB_REPL_FLAG_TARGETS_UPTODATE).
    
    It's debatable whether we should add a hanging link in this case or
    ignore/drop the link. Some cases to consider:
    - If you're talking to a DC that still sends all the links last, you
      could still get object deletion between processing the source object's
      links and sending the target (GET_TGT just restarts the replication
      cycle from scratch). Adding a hanging link in this case would be
      incorrect and would add spurious information to the DB.
    - Suppose there's a bug in Samba that incorrectly results in an object
      disappearing. If other DCs then remove any links that pointed to that
      object, it makes recovering from the problem harder. However, simply
      ignoring the link shouldn't result in data loss, i.e. replication won't
      remove the existing link information from other DCs. Data loss in this
      case would only occur if a new DC were brought online, or if it were a
      new link that was affected.
    Based on this, I think ignoring the link does the least harm.
    
    This problem also highlights that we should really be using the same
    logic in both the unknown target and the deleted target cases.
    Combining the logic and moving it into a common
    replmd_allow_missing_target() function fixes the problem. (This also has
    the side-effect of fixing another logic flaw - in the deleted object
    case we would unnecessarily retry with GET_TGT if the target object was
    in another partition. This is pointless work, because GET_TGT won't
    resolve the target).
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=12972
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit 278039ff78c97bda143b21318a59edbd18fa3825
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Fri Aug 11 16:08:15 2017 +1200

    getncchanges.c: Support GET_TGT better with large numbers of links
    
    A source object can potentially link to thousands of target objects.
    We have to be careful not to overfill the GetNCChanges response message
    with more data than it's possible to send. We also don't want the client
    to timeout while we're busy checking the linked attributes. The GET_TGT
    support added so far is fairly dumb - this patch extends it to better
    handle larger numbers of links.
    
    To do so, this extends the repl_chunk usage so that it also works out if
    the current chunk is full of links. Now as soon as the chunk is full of
    either links or objects, we stop and send it back.
    
    These changes now mean that we need to also check:
    - that all the links for the last source object in the previous chunk
      have been sent, before we move on and send the next object. This only
      takes effect when immediate_link_sync is configured. It also means
      that a chunk in the middle of the replication cycle can now contain
      only links, and no objects.
    - when GET_TGT is used, we only send back the links that we've verified
      the target object for. i.e. if we stop checking links because we timed
      out, we only send back the links whose targets were checked.
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit 7ba10844d1b4c8c1d08c62353150dc03ee166150
Author: Garming Sam <garming at catalyst.net.nz>
Date:   Tue Aug 8 16:27:18 2017 +1200

    getncchanges.c: Refactor to track more state using repl_chunk
    
    To prepare GET_TGT to deal with a large number of links better, there
    is now a 'repl_chunk' struct to help keep track of all the factors
    relating to the current chunk of replication data (i.e. how many
    objects/links we can send and how many we've already processed). This
    means we can have a consistent way of working out whether the current
    chunk is full (whether that be due to objects, links, or just too much
    time taken).
    
    This patch should not alter functionality. This is just a refactor to
    add the basic framework, which will be used in the next patch.
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Signed-off-by: Garming Sam <garming at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>

commit 10df9f6bfd04af08d095b6796f902888f1172b9f
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Tue Aug 15 12:18:02 2017 +1200

    getncchanges.py: Add a multi-valued linked attribute test
    
    Add a test where a source object links to multiple different targets.
    First we do the replication without GET_TGT and check that the server
    can handle sending a chunk containing only links (in the middle of the
    replication). Then we repeat the replication forcing GET_TGT to be used.
    
    To avoid having to create 1500 objects/links, I've lowered the 'max
    link sync' setting on the vampire_dc testenv to 250.
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit 693e3adc1aebdf75d6d9f9866ec8f39ea9d5557b
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Mon Jul 24 14:43:54 2017 +1200

    getncchanges.py: Add a test for dropped cross-partition links
    
    Samba would drop linked attributes that span partitions if it didn't
    know about the target object. This patch adds a test that exposes the
    problem.
    
    I've re-used the code from the previous re-animation test to do this.
    I've also added a very basic DcConnection helper class that basically
    stores the connection state information the drs_base.py uses for
    replication. This allows us to switch the DC we want to replicate from
    easily. This approach could potentially be retro-fitted to some of the
    existing test cases, as it allows us to test both the DRS client code
    and server code at the same time.
    
    Note this test case relates to the code change for commit
    fae5df891c11f642cb.
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    BUG: https://bugzilla.samba.org/show_bug.cgi?id=12972
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit 96971906009cf425b2757725eb627769d1917509
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Thu Jul 20 17:06:14 2017 +1200

    getncchanges.py: Add test for replicating reanimated objects
    
    Reading between the lines, this scenario seems to be the main reason
    that Microsoft added the GET_TGT flag. MS AD can handle getting links
    for unknown targets OK, but if it receives links for a deleted/recycled
    target then it would tend to drop the received links. Samba client also
    used to drop the links if talking to a Microsoft DC (or a Samba server
    with GET_TGT support).
    
    The specific scenario is the client side already knows about a deleted
    object. That object is then re-animated and used as the target for a
    linked attribute. *Then* the target object gets updated again so it gets
    sent in a later replication chunk to the linked attribute, i.e. the
    client receives the link before it learns that the target object has
    been re-animated.
    
    In this test we're interested in particular at how the client behaves
    when it receives a linked attribute for a deleted object. (It *should*
    retry with GET_TGT to make sure the target is up-to-date. However, it
    was just dropping the linked attribute).
    
    To exercise the client-side, we disable replication, setup the
    links/objects on one DC the way we want them, then force a replication
    to the second DC. We then check that when we query each DC, they both
    tell us about the links/objects we're expecting (i.e. no links got
    lost).
    
    Note that this wasn't a problem with older versions of Samba-to-Samba
    because sending the links last guaranteed that the target objects were
    always up-to-date.
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit ed2fc522439349832ad2a8a56abbf63e56011fb9
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Wed Aug 23 10:23:10 2017 +1200

    drs: Add basic GET_TGT support
    
    This adds basic DRS_GET_TGT support. If the GET_TGT flag is specified
    then the server will use the object cache to store the objects it sends
    back. If the target object for a linked attribute is not in the cache
    (i.e. it has not been sent already), then it is added to the response
    message.
    
    Note that large numbers of linked attributes will not be handled well
    yet - the server could potentially try to send more than will fit in a
    single repsonse message.
    
    Also note that the client can sometimes set the GET_TGT flag even if the
    server is still sending the links last. In this case, we know the client
    supports GET_TGT so it's safe to send the links interleaved with the
    source objects (the alternative of fetching the target objects but not
    sending the links until last doesn't really make any sense).
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit 821094d50b0f54700c0ce619e381bbbdb9e21cd0
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Wed Jul 19 11:38:55 2017 +1200

    getncchanges.py: Add tests for object deletion during replication
    
    Add tests that delete the source and target objects for linked
    attributes in the middle of a replication cycle.
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit 469aed088f460cbc1e697b5cd83b6f9739ab7532
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Mon Jul 17 14:04:38 2017 +1200

    getnc_exop.py: Extend EXOP_REPL_OBJ test case to use GET_TGT
    
    We already check that when we use GET_ANC that we still only receive a
    single object when EXOP_REPL_OBJ is used. This extends the test to also
    check that only a single object is returned when GET_TGT is used.
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit 00b20c825c3961966ba9d77a28dfbc9166790977
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Thu Jul 13 11:47:16 2017 +1200

    getncchanges.py: Add test for GET_ANC and GET_TGT combined
    
    The code has to handle needing GET_ANC and GET_TGT in combination, i.e.
    where we fetch the target object for the linked attribute and the target
    object's parent is unknown as well. This patch adds a test case to
    exercise this code path.
    
    The second part of this test exercises GET_ANC/GET_TGT for an
    incremental replication, where the objects are getting filtered by an
    uptodateness-vector/HWM.
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit 6ec9ef2bebe5f9d587963f29c279b938f098893e
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Tue Jun 13 12:14:45 2017 +1200

    getncchanges.py: Add test for adding links during replication
    
    We have identified a case where the Samba server can send linked
    attributes but not the target object. In this case, the Samba DRS client
    would hit the "Failed to re-resolve GUID" case in replmd and silently
    discard the linked attribute.
    
    However, Samba will resend the linked attribute in the next cycle
    (because its USN is still higher than the committed HWM), so it should
    recover OK. On older releases, this may have caused problems if the
    first error resulting in a hanging link (which might mean the second
    time it's processed it still fails to be added).
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit af82bdefcc3b7413c9eed81b4aa5937817b2a9ff
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Wed Jul 12 14:23:35 2017 +1200

    getncchanges.py: Add some GET_TGT test cases
    
    test_repl_get_tgt:
    - Adds 2 sets of objects
    - Links one set to the other
    - Changes the order so the target object comes last in the
      replication (which means the client has to use GET_TGT)
    - Checks that when GET_TGT is used that we have received all target
      objects we need to resolve the linked attibutes
    - Checks that we expect to receive the linked attributes *before*
      the last chunk is sent (by default, Samba sends all the links at
      the end, so this fails)
    - Checks that we eventually receive all expected objects, and all
      links we receive match what is expected
    
    test_repl_get_tgt_chain:
      This adds the linked attributes in a more complicated chain. We add
      300 objects, but the links for 100 objects will point to a linked
      chain of 200 objects.
      This was mainly to determine whether or not Windows follows the
      target object (i.e. whether it sends all the links for the target
      object as well). It turns out Windows maintains its own linked
      attribute DB, so it sends the links based on USN.
    
    Note that the 2 testenvs fail for different reasons. promoted_dc fails
    because it is sending all the linked attributes last. vampire_dc fails
    because it doesn't support GET_TGT yet, so it sends the link before the
    peer knows about the target object.
    
    Note that to test against vampire_dc (rather than the ad_dc_ntvfs DC),
    we need to send the GetNCChanges requests to DC2 instead of DC1.
    I've left the DC numbering scheme as is, but I've addeed a test_ldb_dc
    handle to drs_base.py - it defaults to DC1, but tests can override it
    easily and still have everything work.
    
    While running the new tests through autobuild, I noticed an intermittent
    LDAP_ENTRY_ALREADY_EXISTS failure in the test setup(). This appears to
    be due to a timing issue in the background replication between the
    multiple testenvs. Adding some randomness so that the test base OU is
    unique seems to avoid the problem.
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

commit 172eedc0760b5d471af091c2f08ea70f17b36293
Author: Tim Beale <timbeale at catalyst.net.nz>
Date:   Tue May 23 14:37:56 2017 +1200

    getnc_exop.py: Fix GET_TGT behaviour in DRS tests
    
    The existing code never passed the more_flags parameter into the
    actual getNCChanges request, i.e. _getnc_req10(). This meant the
    existing GET_TGT tests effectively did nothing.
    
    Passing the flag through properly means we have to now change the tests
    as the DNs returned by Windows now include any target objects in the
    linked attributes. These tests now fail against Samba (because it
    doesn't support GET_TGT yet).
    
    Also added comments to the tests to help explain what they are actually
    doing.
    
    Note that Samba and Windows can return the objects in different orders,
    due to significant differences in their underlying DB implementations
    (Windows stores links in a separate DB, so sends links ordered strictly
    by USN, whereas Samba sends links based on the USN of the source
    object). To make the test a fair comparison between Windows and Samba,
    we need to use dn_ordered=False.
    
    Signed-off-by: Tim Beale <timbeale at catalyst.net.nz>
    Reviewed-by: Andrew Bartlett <abartlet at samba.org>
    Reviewed-by: Garming Sam <garming at catalyst.net.nz>

-----------------------------------------------------------------------

Summary of changes:
 selftest/knownfail.d/getncchanges               |   6 +
 selftest/knownfail.d/link_conflicts             |   9 +
 selftest/target/Samba4.pm                       |   3 +-
 source4/dsdb/common/util.c                      |  40 ++
 source4/dsdb/samdb/ldb_modules/repl_meta_data.c | 197 +++---
 source4/rpc_server/drsuapi/getncchanges.c       | 513 +++++++++++++---
 source4/selftest/tests.py                       |   5 +
 source4/torture/drs/python/drs_base.py          |  33 +-
 source4/torture/drs/python/getnc_exop.py        |  87 ++-
 source4/torture/drs/python/getncchanges.py      | 775 ++++++++++++++++++++++--
 source4/torture/drs/python/link_conflicts.py    | 498 +++++++++++++++
 11 files changed, 1931 insertions(+), 235 deletions(-)
 create mode 100644 selftest/knownfail.d/getncchanges
 create mode 100644 selftest/knownfail.d/link_conflicts
 create mode 100644 source4/torture/drs/python/link_conflicts.py


Changeset truncated at 500 lines:

diff --git a/selftest/knownfail.d/getncchanges b/selftest/knownfail.d/getncchanges
new file mode 100644
index 0000000..5ef1bc9
--- /dev/null
+++ b/selftest/knownfail.d/getncchanges
@@ -0,0 +1,6 @@
+# GET_TGT tests currently only work for testenvs that send the links at the
+# same time as the source objects. Currently this is only the vampire_dc
+samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt\(promoted_dc\)
+samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt_chain\(promoted_dc\)
+samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt_and_anc\(promoted_dc\)
+samba4.drs.getncchanges.python\(promoted_dc\).getncchanges.DrsReplicaSyncIntegrityTestCase.test_repl_get_tgt_multivalued_links\(promoted_dc\)
diff --git a/selftest/knownfail.d/link_conflicts b/selftest/knownfail.d/link_conflicts
new file mode 100644
index 0000000..1c41335
--- /dev/null
+++ b/selftest/knownfail.d/link_conflicts
@@ -0,0 +1,9 @@
+# Currently Samba can't resolve a conflict for a single-valued link attribute
+samba4.drs.link_conflicts.python\(vampire_dc\).link_conflicts.DrsReplicaLinkConflictTestCase.test_conflict_single_valued_link\(vampire_dc\)
+samba4.drs.link_conflicts.python\(promoted_dc\).link_conflicts.DrsReplicaLinkConflictTestCase.test_conflict_single_valued_link\(promoted_dc\)
+# There's a bug where Samba can incorrectly increment the attribute's version number
+samba4.drs.link_conflicts.python\(vampire_dc\).link_conflicts.DrsReplicaLinkConflictTestCase.test_link_deletion_conflict\(vampire_dc\)
+samba4.drs.link_conflicts.python\(promoted_dc\).link_conflicts.DrsReplicaLinkConflictTestCase.test_link_deletion_conflict\(promoted_dc\)
+samba4.drs.link_conflicts.python\(vampire_dc\).link_conflicts.DrsReplicaLinkConflictTestCase.test_full_sync_link_conflict\(vampire_dc\)
+samba4.drs.link_conflicts.python\(promoted_dc\).link_conflicts.DrsReplicaLinkConflictTestCase.test_full_sync_link_conflict\(promoted_dc\)
+
diff --git a/selftest/target/Samba4.pm b/selftest/target/Samba4.pm
index 3d14885..7930a4e 100755
--- a/selftest/target/Samba4.pm
+++ b/selftest/target/Samba4.pm
@@ -1293,7 +1293,8 @@ sub provision_vampire_dc($$$)
 	if ($fl == "2000") {
 		$name = "vampire2000dc";
 	} else {
-		$extra_conf = "drs: immediate link sync = yes";
+		$extra_conf = "drs: immediate link sync = yes
+                       drs: max link sync = 250";
 	}
 
 	# We do this so that we don't run the provision.  That's the job of 'net vampire'.
diff --git a/source4/dsdb/common/util.c b/source4/dsdb/common/util.c
index 430620e..49345e5 100644
--- a/source4/dsdb/common/util.c
+++ b/source4/dsdb/common/util.c
@@ -5555,3 +5555,43 @@ int dsdb_user_obj_set_primary_group_id(struct ldb_context *ldb, struct ldb_messa
 
 	return LDB_SUCCESS;
 }
+
+/**
+ * Returns True if the source and target DNs both have the same naming context,
+ * i.e. they're both in the same partition.
+ */
+bool dsdb_objects_have_same_nc(struct ldb_context *ldb,
+			       TALLOC_CTX *mem_ctx,
+			       struct ldb_dn *source_dn,
+			       struct ldb_dn *target_dn)
+{
+	TALLOC_CTX *tmp_ctx;
+	struct ldb_dn *source_nc;
+	struct ldb_dn *target_nc;
+	int ret;
+	bool same_nc = true;
+
+	tmp_ctx = talloc_new(mem_ctx);
+
+	ret = dsdb_find_nc_root(ldb, tmp_ctx, source_dn, &source_nc);
+	if (ret != LDB_SUCCESS) {
+		DBG_ERR("Failed to find base DN for source %s\n",
+			ldb_dn_get_linearized(source_dn));
+		talloc_free(tmp_ctx);
+		return true;
+	}
+
+	ret = dsdb_find_nc_root(ldb, tmp_ctx, target_dn, &target_nc);
+	if (ret != LDB_SUCCESS) {
+		DBG_ERR("Failed to find base DN for target %s\n",
+			ldb_dn_get_linearized(target_dn));
+		talloc_free(tmp_ctx);
+		return true;
+	}
+
+	same_nc = (ldb_dn_compare(source_nc, target_nc) == 0);
+
+	talloc_free(tmp_ctx);
+
+	return same_nc;
+}
diff --git a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
index d2c2084..909a1be 100644
--- a/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
+++ b/source4/dsdb/samdb/ldb_modules/repl_meta_data.c
@@ -6726,43 +6726,96 @@ static int replmd_extended_replicated_objects(struct ldb_module *module, struct
 }
 
 /**
- * Returns True if the source and target DNs both have the same naming context,
- * i.e. they're both in the same partition.
+ * Checks how to handle an missing target - either we need to fail the
+ * replication and retry with GET_TGT, ignore the link and continue, or try to
+ * add a partial link to an unknown target.
  */
-static bool replmd_objects_have_same_nc(struct ldb_context *ldb,
-					TALLOC_CTX *mem_ctx,
-					struct ldb_dn *source_dn,
-					struct ldb_dn *target_dn)
+static int replmd_allow_missing_target(struct ldb_module *module,
+				       TALLOC_CTX *mem_ctx,
+				       struct ldb_dn *target_dn,
+				       struct ldb_dn *source_dn,
+				       bool is_obj_commit,
+				       struct GUID *guid,
+				       uint32_t dsdb_repl_flags,
+				       bool *ignore_link,
+				       const char * missing_str)
 {
-	TALLOC_CTX *tmp_ctx;
-	struct ldb_dn *source_nc;
-	struct ldb_dn *target_nc;
-	int ret;
-	bool same_nc = true;
+	struct ldb_context *ldb = ldb_module_get_ctx(module);
+	bool is_in_same_nc;
 
-	tmp_ctx = talloc_new(mem_ctx);
+	/*
+	 * we may not be able to resolve link targets properly when
+	 * dealing with subsets of objects, e.g. the source is a
+	 * critical object and the target isn't
+	 *
+	 * TODO:
+	 * When we implement Trusted Domains we need to consider
+	 * whether they get treated as an incomplete replica here or not
+	 */
+	if (dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET) {
 
-	ret = dsdb_find_nc_root(ldb, tmp_ctx, source_dn, &source_nc);
-	if (ret != LDB_SUCCESS) {
-		DBG_ERR("Failed to find base DN for source %s\n",
-			ldb_dn_get_linearized(source_dn));
-		talloc_free(tmp_ctx);
-		return true;
+		/*
+		 * Ignore the link. We don't increase the highwater-mark in
+		 * the object subset cases, so subsequent replications should
+		 * resolve any missing links
+		 */
+		DEBUG(2, ("%s target %s linked from %s\n", missing_str,
+			  ldb_dn_get_linearized(target_dn),
+			  ldb_dn_get_linearized(source_dn)));
+		*ignore_link = true;
+		return LDB_SUCCESS;
 	}
 
-	ret = dsdb_find_nc_root(ldb, tmp_ctx, target_dn, &target_nc);
-	if (ret != LDB_SUCCESS) {
-		DBG_ERR("Failed to find base DN for target %s\n",
-			ldb_dn_get_linearized(target_dn));
-		talloc_free(tmp_ctx);
-		return true;
+	if (dsdb_repl_flags & DSDB_REPL_FLAG_TARGETS_UPTODATE) {
+
+		/*
+		 * target should already be up-to-date so there's no point in
+		 * retrying. This could be due to bad timing, or if a target
+		 * on a one-way link was deleted. We ignore the link rather
+		 * than failing the replication cycle completely
+		 */
+		*ignore_link = true;
+		DBG_WARNING("%s is %s but up to date. Ignoring link from %s\n",
+			    ldb_dn_get_linearized(target_dn), missing_str,
+			    ldb_dn_get_linearized(source_dn));
+		return LDB_SUCCESS;
+	}
+	
+	is_in_same_nc = dsdb_objects_have_same_nc(ldb,
+						  mem_ctx,
+						  source_dn,
+						  target_dn);
+	if (is_in_same_nc) {
+		/* fail the replication and retry with GET_TGT */
+		ldb_asprintf_errstring(ldb, "%s target %s GUID %s linked from %s\n",
+				       missing_str,
+				       ldb_dn_get_linearized(target_dn),
+				       GUID_string(mem_ctx, guid),
+				       ldb_dn_get_linearized(source_dn));
+		return LDB_ERR_NO_SUCH_OBJECT;
 	}
 
-	same_nc = (ldb_dn_compare(source_nc, target_nc) == 0);
+	/*
+	 * The target of the cross-partition link is missing. Continue
+	 * and try to at least add the forward-link. This isn't great,
+	 * but a partial link can be fixed by dbcheck, so it's better
+	 * than dropping the link completely.
+	 */
+	*ignore_link = false;
 
-	talloc_free(tmp_ctx);
+	if (is_obj_commit) {
 
-	return same_nc;
+		/*
+		 * Only log this when we're actually committing the objects.
+		 * This avoids spurious logs, i.e. if we're just verifying the
+		 * received link during a join.
+		 */
+		DBG_WARNING("%s cross-partition target %s linked from %s\n",
+			    missing_str, ldb_dn_get_linearized(target_dn),
+			    ldb_dn_get_linearized(source_dn));
+	}
+	
+	return LDB_SUCCESS;
 }
 
 /**
@@ -6775,6 +6828,7 @@ static int replmd_check_target_exists(struct ldb_module *module,
 				      struct dsdb_dn *dsdb_dn,
 				      struct la_entry *la_entry,
 				      struct ldb_dn *source_dn,
+				      bool is_obj_commit,
 				      struct GUID *guid,
 				      bool *ignore_link)
 {
@@ -6847,46 +6901,14 @@ static int replmd_check_target_exists(struct ldb_module *module,
 	if (target_res->count == 0) {
 
 		/*
-		 * we may not be able to resolve link targets properly when
-		 * dealing with subsets of objects, e.g. the source is a
-		 * critical object and the target isn't
-		 *
-		 * TODO:
-		 * When we implement Trusted Domains we need to consider
-		 * whether they get treated as an incomplete replica here or not
+		 * target object is unknown. Check whether to ignore the link,
+		 * fail the replication, or add a partial link
 		 */
-		if (la_entry->dsdb_repl_flags & DSDB_REPL_FLAG_OBJECT_SUBSET) {
+		ret = replmd_allow_missing_target(module, tmp_ctx, dsdb_dn->dn,
+						  source_dn, is_obj_commit, guid,
+						  la_entry->dsdb_repl_flags,
+						  ignore_link, "Unknown");
 
-			/*
-			 * We don't increase the highwater-mark in the object
-			 * subset cases, so subsequent replications should
-			 * resolve any missing links
-			 */
-			DEBUG(2,(__location__
-				 ": Failed to find target %s linked from %s\n",
-				 ldb_dn_get_linearized(dsdb_dn->dn),
-				 ldb_dn_get_linearized(source_dn)));
-			*ignore_link = true;
-
-		} else if (replmd_objects_have_same_nc(ldb, tmp_ctx, source_dn,
-						       dsdb_dn->dn)) {
-			ldb_asprintf_errstring(ldb, "Unknown target %s GUID %s linked from %s\n",
-					       ldb_dn_get_linearized(dsdb_dn->dn),
-					       GUID_string(tmp_ctx, guid),
-					       ldb_dn_get_linearized(source_dn));
-			ret = LDB_ERR_NO_SUCH_OBJECT;
-		} else {
-
-			/*
-			 * The target of the cross-partition link is missing.
-			 * Continue and try to at least add the forward-link.
-			 * This isn't great, but if we can add a partial link
-			 * then it's better than nothing.
-			 */
-			DEBUG(2,("Failed to resolve cross-partition link between %s and %s\n",
-				 ldb_dn_get_linearized(source_dn),
-				 ldb_dn_get_linearized(dsdb_dn->dn)));
-		}
 	} else if (target_res->count != 1) {
 		ldb_asprintf_errstring(ldb, "More than one object found matching objectGUID %s\n",
 				       GUID_string(tmp_ctx, guid));
@@ -6909,26 +6931,15 @@ static int replmd_check_target_exists(struct ldb_module *module,
 		 */
 		if (target_deletion_state >= OBJECT_RECYCLED) {
 
-			if (la_entry->dsdb_repl_flags & DSDB_REPL_FLAG_TARGETS_UPTODATE) {
-
-				/*
-				 * target should already be uptodate so there's no
-				 * point retrying - it's probably just bad timing
-				 */
-				*ignore_link = true;
-				DEBUG(0, ("%s is deleted but up to date. "
-					  "Ignoring link from %s\n",
-					  ldb_dn_get_linearized(dsdb_dn->dn),
-					  ldb_dn_get_linearized(source_dn)));
-
-			} else {
-				ldb_asprintf_errstring(ldb,
-						       "Deleted target %s GUID %s linked from %s",
-						       ldb_dn_get_linearized(dsdb_dn->dn),
-						       GUID_string(tmp_ctx, guid),
-						       ldb_dn_get_linearized(source_dn));
-				ret = LDB_ERR_NO_SUCH_OBJECT;
-			}
+			/*
+			 * target object is deleted. Check whether to ignore the
+			 * link, fail the replication, or add a partial link
+			 */
+			ret = replmd_allow_missing_target(module, tmp_ctx,
+							  dsdb_dn->dn, source_dn,
+							  is_obj_commit, guid,
+							  la_entry->dsdb_repl_flags,
+							  ignore_link, "Deleted");
 		}
 	}
 
@@ -7084,8 +7095,18 @@ static int replmd_verify_linked_attribute(struct replmd_replicated_request *ar,
 		return ret;
 	}
 
-	ret = replmd_check_target_exists(module, tgt_dsdb_dn, la,
-					 src_msg->dn, &guid, &dummy);
+	/*
+	 * We can skip the target object checks if we're only syncing critical
+	 * objects, or we know the target is up-to-date. If either case, we
+	 * still continue even if the target doesn't exist
+	 */
+	if ((la->dsdb_repl_flags & (DSDB_REPL_FLAG_OBJECT_SUBSET |
+				    DSDB_REPL_FLAG_TARGETS_UPTODATE)) == 0) {
+
+		ret = replmd_check_target_exists(module, tgt_dsdb_dn, la,
+						 src_msg->dn, false, &guid,
+						 &dummy);
+	}
 
 	/*
 	 * When we fail to find the target object, the error code we pass
@@ -7173,7 +7194,7 @@ static int replmd_process_linked_attribute(struct ldb_module *module,
 	}
 
 	ret = replmd_check_target_exists(module, dsdb_dn, la_entry, msg->dn,
-					 &guid, &ignore_link);
+					 true, &guid, &ignore_link);
 
 	if (ret != LDB_SUCCESS) {
 		talloc_free(tmp_ctx);
diff --git a/source4/rpc_server/drsuapi/getncchanges.c b/source4/rpc_server/drsuapi/getncchanges.c
index cd3f51f..b48e9c7 100644
--- a/source4/rpc_server/drsuapi/getncchanges.c
+++ b/source4/rpc_server/drsuapi/getncchanges.c
@@ -46,6 +46,8 @@
 #define DBGC_CLASS            DBGC_DRS_REPL
 
 #define DRS_GUID_SIZE       16
+#define DEFAULT_MAX_OBJECTS 1000
+#define DEFAULT_MAX_LINKS   1500
 
 /*
  * state of a partially-completed replication cycle. This state persists
@@ -60,6 +62,7 @@ struct drsuapi_getncchanges_state {
 	struct GUID ncRoot_guid;
 	bool is_schema_nc;
 	bool is_get_anc;
+	bool is_get_tgt;
 	uint64_t min_usn;
 	uint64_t max_usn;
 	struct drsuapi_DsReplicaHighWaterMark last_hwm;
@@ -87,9 +90,18 @@ struct la_for_sorting {
  * only exists for a single call to dcesrv_drsuapi_DsGetNCChanges()
  */
 struct getncchanges_repl_chunk {
-	struct drsuapi_DsGetNCChangesCtr6 *ctr6;
+	uint32_t max_objects;
+	uint32_t max_links;
+	uint32_t tgt_la_count;
+	bool immediate_link_sync;
+	time_t max_wait;
+	time_t start;
 
-	/* the last object written to the response */
+	/* stores the objects to be sent in this chunk */
+	uint32_t object_count;
+	struct drsuapi_DsReplicaObjectListItemEx *object_list;
+
+	/* the last object added to this replication chunk */
 	struct drsuapi_DsReplicaObjectListItemEx *last_object;
 };
 
@@ -2245,25 +2257,25 @@ static WERROR getncchanges_add_ancestors(struct drsuapi_DsReplicaObjectListItemE
 }
 
 /**
- * Adds a list of new objects into the getNCChanges response message
+ * Adds a list of new objects into the current chunk of replication data to send
  */
-static void getncchanges_add_objs_to_resp(struct getncchanges_repl_chunk *repl_chunk,
-					  struct drsuapi_DsReplicaObjectListItemEx *obj_list)
+static void getncchanges_chunk_add_objects(struct getncchanges_repl_chunk *repl_chunk,
+					   struct drsuapi_DsReplicaObjectListItemEx *obj_list)
 {
 	struct drsuapi_DsReplicaObjectListItemEx *obj;
 
 	/*
-	 * We track the last object added to the response message, so just add
+	 * We track the last object added to the replication chunk, so just add
 	 * the new object-list onto the end
 	 */
-	if (repl_chunk->last_object == NULL) {
-		repl_chunk->ctr6->first_object = obj_list;
+	if (repl_chunk->object_list == NULL) {
+		repl_chunk->object_list = obj_list;
 	} else {
 		repl_chunk->last_object->next_object = obj_list;
 	}
 
 	for (obj = obj_list; obj != NULL; obj = obj->next_object) {
-		repl_chunk->ctr6->object_count += 1;
+		repl_chunk->object_count += 1;
 
 		/*
 		 * Remember the last object in the response - we'll use this to
@@ -2347,6 +2359,337 @@ static WERROR getncchanges_get_obj_to_send(const struct ldb_message *msg,
 	return werr;
 }
 
+/**
+ * Returns the number of links that are waiting to be sent
+ */
+static uint32_t getncchanges_chunk_links_pending(struct getncchanges_repl_chunk *repl_chunk,
+						 struct drsuapi_getncchanges_state *getnc_state)
+{
+	uint32_t links_to_send = 0;
+
+	if (getnc_state->is_get_tgt) {
+
+		/*
+		 * when the GET_TGT flag is set, only include the linked
+		 * attributes whose target object has already been checked
+		 * (i.e. they're ready to send).
+		 */
+		if (repl_chunk->tgt_la_count > getnc_state->la_idx) {
+			links_to_send = (repl_chunk->tgt_la_count -
+					 getnc_state->la_idx);
+		}
+	} else {
+		links_to_send = getnc_state->la_count - getnc_state->la_idx;
+	}
+
+	return links_to_send;
+}
+
+/**
+ * Returns the max number of links that will fit in the current replication chunk
+ */
+static uint32_t getncchanges_chunk_max_links(struct getncchanges_repl_chunk *repl_chunk)
+{
+	uint32_t max_links = 0;
+
+	if (repl_chunk->max_links != DEFAULT_MAX_LINKS ||
+	    repl_chunk->max_objects != DEFAULT_MAX_OBJECTS) {
+
+		/*
+		 * We're using non-default settings, so don't try to adjust
+		 * them, just trust the user has configured decent values
+		 */
+		max_links = repl_chunk->max_links;
+
+	} else if (repl_chunk->max_links > repl_chunk->object_count) {
+
+		/*
+		 * This is just an approximate guess to avoid overfilling the
+		 * replication chunk. It's the logic we've used historically.
+		 * E.g. if we've already sent 1000 objects, then send 1000 fewer
+		 * links. For comparison, the max that Windows seems to send is
+		 * ~2700 links and ~250 objects (although this may vary based
+		 * on timeouts)
+		 */
+		max_links = repl_chunk->max_links - repl_chunk->object_count;
+	}
+
+	return max_links;
+}
+
+/**
+ * Returns true if the current GetNCChanges() call has taken longer than its
+ * allotted time. This prevents the client from timing out.
+ */
+static bool getncchanges_chunk_timed_out(struct getncchanges_repl_chunk *repl_chunk)
+{
+	return (time(NULL) - repl_chunk->start > repl_chunk->max_wait);
+}
+
+/**
+ * Returns true if the current chunk of replication data has reached the
+ * max_objects and/or max_links thresholds.
+ */
+static bool getncchanges_chunk_is_full(struct getncchanges_repl_chunk *repl_chunk,
+				       struct drsuapi_getncchanges_state *getnc_state)
+{
+	bool chunk_full = false;
+	uint32_t links_to_send;
+	uint32_t chunk_limit;


-- 
Samba Shared Repository