[SCM] CTDB repository - branch master updated - ctdb-1.0.114-154-g2f6a870
Ronnie Sahlberg
sahlberg at samba.org
Wed Jun 9 00:19:45 MDT 2010
The branch, master has been updated
via 2f6a870d7ff02ceb61fde242f752dccbfcb4cb37 (commit)
from 9b4a83e49c5df80df8498b7384c5f53f390c1d9d (commit)
http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=shortlog;h=master
- Log -----------------------------------------------------------------
commit 2f6a870d7ff02ceb61fde242f752dccbfcb4cb37
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Wed Jun 9 16:12:36 2010 +1000
idr can timeout and wrap/be reused quite quickly.
If a noremote node hangs for an extended period, it is possible
that we might have a DMASTER request in flight for record A to that node.
Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B.
If while the request for B is in flight, the first tnode un-hangs and responds back
we would receive a dmaster reply for the wrong record.
This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key) but once the migration would complete we would chainunlock idr->state->call->key
Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight.
-----------------------------------------------------------------------
Summary of changes:
server/ctdb_call.c | 12 +++++++++++-
1 files changed, 11 insertions(+), 1 deletions(-)
Changeset truncated at 500 lines:
diff --git a/server/ctdb_call.c b/server/ctdb_call.c
index 6ef73fe..b6af807 100644
--- a/server/ctdb_call.c
+++ b/server/ctdb_call.c
@@ -276,6 +276,16 @@ static void ctdb_become_dmaster(struct ctdb_db_context *ctdb_db,
return;
}
+ if (key.dsize != state->call->key.dsize || memcmp(key.dptr, state->call->key.dptr, key.dsize)) {
+ DEBUG(DEBUG_ERR, ("Got bogus DMASTER packet reqid:%u\n from node %u. Key does not match key held in matching idr.", hdr->reqid, hdr->srcnode));
+
+ ret = ctdb_ltdb_unlock(ctdb_db, key);
+ if (ret != 0) {
+ DEBUG(DEBUG_ERR,(__location__ " ctdb_ltdb_unlock() failed with error %d\n", ret));
+ }
+ return;
+ }
+
if (hdr->reqid != state->reqid) {
/* we found a record but it was the wrong one */
DEBUG(DEBUG_ERR, ("Dropped orphan in ctdb_become_dmaster with reqid:%u\n from node %u", hdr->reqid, hdr->srcnode));
@@ -289,7 +299,7 @@ static void ctdb_become_dmaster(struct ctdb_db_context *ctdb_db,
ctdb_call_local(ctdb_db, state->call, &header, state, &data, ctdb->pnn);
- ret = ctdb_ltdb_unlock(ctdb_db, key);
+ ret = ctdb_ltdb_unlock(ctdb_db, state->call->key);
if (ret != 0) {
DEBUG(DEBUG_ERR,(__location__ " ctdb_ltdb_unlock() failed with error %d\n", ret));
}
--
CTDB repository
More information about the samba-cvs
mailing list