[SCM] CTDB repository - branch 1.0.112 updated - ctdb-1.0.111-101-g4b5bce6

Ronnie Sahlberg sahlberg at samba.org
Wed Jun 9 00:19:44 MDT 2010


The branch, 1.0.112 has been updated
       via  4b5bce6bcebb5cdb6048283181591562badfc2d9 (commit)
       via  3cd9d214e8a2e915fbd0dc321cc12b5d80130fd2 (commit)
      from  615801f246ed6c9e6cf402b8647ac65b667ba802 (commit)

http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=shortlog;h=1.0.112


- Log -----------------------------------------------------------------
commit 4b5bce6bcebb5cdb6048283181591562badfc2d9
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date:   Wed Jun 9 16:12:36 2010 +1000

    idr can timeout and wrap/be reused quite quickly.
    
    If a noremote node hangs for an extended period, it is possible
    that we might have a DMASTER request in flight for record A to that node.
    Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B.
    
    If while the request for B is in flight,  the first tnode un-hangs and responds back
    we would receive a dmaster reply for the wrong record.
    
    This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key)   but once the migration would complete we would chainunlock   idr->state->call->key
    
    Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight.

commit 3cd9d214e8a2e915fbd0dc321cc12b5d80130fd2
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date:   Wed Jun 9 15:12:26 2010 +1000

    We can not be holding a chainlock at this stage, so the tdb_chainunlock() call is bogus
    
    ( a child process might be holding the lock, but not the main daemon)

-----------------------------------------------------------------------

Summary of changes:
 server/ctdb_call.c        |   12 +++++++++++-
 server/ctdb_ltdb_server.c |    1 -
 2 files changed, 11 insertions(+), 2 deletions(-)


Changeset truncated at 500 lines:

diff --git a/server/ctdb_call.c b/server/ctdb_call.c
index dc6dc95..fd95b61 100644
--- a/server/ctdb_call.c
+++ b/server/ctdb_call.c
@@ -276,6 +276,16 @@ static void ctdb_become_dmaster(struct ctdb_db_context *ctdb_db,
 		return;
 	}
 
+	if (key.dsize != state->call->key.dsize || memcmp(key.dptr, state->call->key.dptr, key.dsize)) {
+		DEBUG(DEBUG_ERR, ("Got bogus DMASTER packet reqid:%u\n from node %u. Key does not match key held in matching idr.", hdr->reqid, hdr->srcnode));
+
+		ret = ctdb_ltdb_unlock(ctdb_db, key);
+		if (ret != 0) {
+			DEBUG(DEBUG_ERR,(__location__ " ctdb_ltdb_unlock() failed with error %d\n", ret));
+		}
+		return;
+	}
+
 	if (hdr->reqid != state->reqid) {
 		/* we found a record  but it was the wrong one */
 		DEBUG(DEBUG_ERR, ("Dropped orphan in ctdb_become_dmaster with reqid:%u\n from node %u", hdr->reqid, hdr->srcnode));
@@ -289,7 +299,7 @@ static void ctdb_become_dmaster(struct ctdb_db_context *ctdb_db,
 
 	ctdb_call_local(ctdb_db, state->call, &header, state, &data, ctdb->pnn);
 
-	ret = ctdb_ltdb_unlock(ctdb_db, key);
+	ret = ctdb_ltdb_unlock(ctdb_db, state->call->key);
 	if (ret != 0) {
 		DEBUG(DEBUG_ERR,(__location__ " ctdb_ltdb_unlock() failed with error %d\n", ret));
 	}
diff --git a/server/ctdb_ltdb_server.c b/server/ctdb_ltdb_server.c
index 1ce7283..03c62ac 100644
--- a/server/ctdb_ltdb_server.c
+++ b/server/ctdb_ltdb_server.c
@@ -141,7 +141,6 @@ int ctdb_ltdb_lock_requeue(struct ctdb_db_context *ctdb_db,
 	/* now the contended path */
 	h = ctdb_lockwait(ctdb_db, key, lock_fetch_callback, state);
 	if (h == NULL) {
-		tdb_chainunlock(tdb, key);
 		return -1;
 	}
 


-- 
CTDB repository


More information about the samba-cvs mailing list