[SCM] CTDB repository - branch 1.0.112 updated - ctdb-1.0.111-101-g4b5bce6
Ronnie Sahlberg
sahlberg at samba.org
Wed Jun 9 00:19:44 MDT 2010
The branch, 1.0.112 has been updated
via 4b5bce6bcebb5cdb6048283181591562badfc2d9 (commit)
via 3cd9d214e8a2e915fbd0dc321cc12b5d80130fd2 (commit)
from 615801f246ed6c9e6cf402b8647ac65b667ba802 (commit)
http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=shortlog;h=1.0.112
- Log -----------------------------------------------------------------
commit 4b5bce6bcebb5cdb6048283181591562badfc2d9
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Wed Jun 9 16:12:36 2010 +1000
idr can timeout and wrap/be reused quite quickly.
If a noremote node hangs for an extended period, it is possible
that we might have a DMASTER request in flight for record A to that node.
Eventually we will reuse the idr, and may reuse it for a DMASTER request to a different node for a different record B.
If while the request for B is in flight, the first tnode un-hangs and responds back
we would receive a dmaster reply for the wrong record.
This would cause a record to become perpetually locked, since inside the daemon we would tdb_chainlock(dmaster_reply->pdu->key) but once the migration would complete we would chainunlock idr->state->call->key
Adding code to verify that when we receive a dmaster reply packet that it does in fact match the exact same key that the state variable we have for the idr in flight.
commit 3cd9d214e8a2e915fbd0dc321cc12b5d80130fd2
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Wed Jun 9 15:12:26 2010 +1000
We can not be holding a chainlock at this stage, so the tdb_chainunlock() call is bogus
( a child process might be holding the lock, but not the main daemon)
-----------------------------------------------------------------------
Summary of changes:
server/ctdb_call.c | 12 +++++++++++-
server/ctdb_ltdb_server.c | 1 -
2 files changed, 11 insertions(+), 2 deletions(-)
Changeset truncated at 500 lines:
diff --git a/server/ctdb_call.c b/server/ctdb_call.c
index dc6dc95..fd95b61 100644
--- a/server/ctdb_call.c
+++ b/server/ctdb_call.c
@@ -276,6 +276,16 @@ static void ctdb_become_dmaster(struct ctdb_db_context *ctdb_db,
return;
}
+ if (key.dsize != state->call->key.dsize || memcmp(key.dptr, state->call->key.dptr, key.dsize)) {
+ DEBUG(DEBUG_ERR, ("Got bogus DMASTER packet reqid:%u\n from node %u. Key does not match key held in matching idr.", hdr->reqid, hdr->srcnode));
+
+ ret = ctdb_ltdb_unlock(ctdb_db, key);
+ if (ret != 0) {
+ DEBUG(DEBUG_ERR,(__location__ " ctdb_ltdb_unlock() failed with error %d\n", ret));
+ }
+ return;
+ }
+
if (hdr->reqid != state->reqid) {
/* we found a record but it was the wrong one */
DEBUG(DEBUG_ERR, ("Dropped orphan in ctdb_become_dmaster with reqid:%u\n from node %u", hdr->reqid, hdr->srcnode));
@@ -289,7 +299,7 @@ static void ctdb_become_dmaster(struct ctdb_db_context *ctdb_db,
ctdb_call_local(ctdb_db, state->call, &header, state, &data, ctdb->pnn);
- ret = ctdb_ltdb_unlock(ctdb_db, key);
+ ret = ctdb_ltdb_unlock(ctdb_db, state->call->key);
if (ret != 0) {
DEBUG(DEBUG_ERR,(__location__ " ctdb_ltdb_unlock() failed with error %d\n", ret));
}
diff --git a/server/ctdb_ltdb_server.c b/server/ctdb_ltdb_server.c
index 1ce7283..03c62ac 100644
--- a/server/ctdb_ltdb_server.c
+++ b/server/ctdb_ltdb_server.c
@@ -141,7 +141,6 @@ int ctdb_ltdb_lock_requeue(struct ctdb_db_context *ctdb_db,
/* now the contended path */
h = ctdb_lockwait(ctdb_db, key, lock_fetch_callback, state);
if (h == NULL) {
- tdb_chainunlock(tdb, key);
return -1;
}
--
CTDB repository
More information about the samba-cvs
mailing list