[SCM] CTDB repository - branch master updated - ctdb-2.1-95-gb5a8791

Amitay Isaacs amitay at samba.org
Wed Apr 24 04:41:45 MDT 2013


The branch, master has been updated
       via  b5a8791268e938d7e017056e0e2bd2cbec1fa690 (commit)
       via  c7eab97c7a939710b73aae2d75b404b235a998f5 (commit)
       via  f99eb2f56d8ca27110a45ae0e1c4bff40ac7a60e (commit)
       via  a62775334aa20d1d850d2df705eb70303b04ac5c (commit)
       via  61f17e53576197def46bc61fdf0cdb5282333a3e (commit)
       via  c7924ce6404bb18641b00d5fbd2fe9da9aaf7959 (commit)
       via  61264debba58355b9716ac1637fdedef5ed249c8 (commit)
       via  06de786c786f1cab4c6721adf47c2cb1e8a72adb (commit)
       via  eee23d44b6427be8ab49bbfcee3abb62f37dfcc7 (commit)
       via  e397702e271af38204fd99733bbeba7c1db3a999 (commit)
       via  e3740899c1af6962f93c85ad7d1cb71bddce45c6 (commit)
       via  b7c3b8cdf92c597e621e3dae28b110d321de5ea8 (commit)
       via  59a887e12469266e514ad7d4e34810e7ea888ba3 (commit)
       via  11d728465a9c635e1829abaae17e2f7720433b69 (commit)
       via  3710dd0f313f551f1b302b4961e0203243e3d661 (commit)
       via  4640979b526b6dac69a6a0555bfce75fe0206dac (commit)
       via  f3e6e7f8ef22bd70dd2f101d818e2e5ab5ed3cd8 (commit)
       via  817c77a3d0a3546bf46389cec5f6b54778dd1693 (commit)
       via  3f7e35ff0db740cdcb6d27c43a59bb6ca6066efb (commit)
       via  e72a5e11845fe445baaee4730bb0bea8588ee9e3 (commit)
      from  dc4ca816630ed44b419108da53421331243fb8c7 (commit)

http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit b5a8791268e938d7e017056e0e2bd2cbec1fa690
Author: Michael Adam <obnox at samba.org>
Date:   Fri Apr 19 16:24:32 2013 +0200

    recover: use CTDB_REC_RO_FLAGS where appropriate
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit c7eab97c7a939710b73aae2d75b404b235a998f5
Author: Michael Adam <obnox at samba.org>
Date:   Fri Apr 19 16:23:16 2013 +0200

    ctdb_daemon: use CTDB_REC_RO_FLAGS where appropriate
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit f99eb2f56d8ca27110a45ae0e1c4bff40ac7a60e
Author: Michael Adam <obnox at samba.org>
Date:   Fri Apr 19 16:22:49 2013 +0200

    ctdb_call: use CTDB_REC_RO_FLAGS where appropriate
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit a62775334aa20d1d850d2df705eb70303b04ac5c
Author: Michael Adam <obnox at samba.org>
Date:   Fri Apr 19 16:09:34 2013 +0200

    vacuum: use  CTDB_REC_RO_FLAGS in the vacuuming code
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit 61f17e53576197def46bc61fdf0cdb5282333a3e
Author: Michael Adam <obnox at samba.org>
Date:   Fri Apr 19 15:55:38 2013 +0200

    ltdb_server: use CTDB_REC_RO_FLAGS where appropriate
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit c7924ce6404bb18641b00d5fbd2fe9da9aaf7959
Author: Michael Adam <obnox at samba.org>
Date:   Fri Apr 19 16:01:45 2013 +0200

    include: define CTDB_REC_RO_FLAGS - all read-only related record flags
    
    This is used for some checks
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit 61264debba58355b9716ac1637fdedef5ed249c8
Author: Michael Adam <obnox at samba.org>
Date:   Fri Feb 22 16:12:17 2013 +0100

    vacuum: Update (C)
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit 06de786c786f1cab4c6721adf47c2cb1e8a72adb
Author: Michael Adam <obnox at samba.org>
Date:   Sat Dec 29 17:23:27 2012 +0100

    vacuum: extend the header comment for ctdb_process_delete_list()
    
    Describe the (new) process more precisely.
    And mention that is the last step of the vacuuming process
    that is performed on the lmaster.
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit eee23d44b6427be8ab49bbfcee3abb62f37dfcc7
Author: Michael Adam <obnox at samba.org>
Date:   Sat Jan 5 01:20:18 2013 +0100

    vacuum: turn the vacuuming on lmaster into a three-phase process.
    
    More precisely, before locally deleting an empty record, that has been
    migrated with data and that we are dmaster and laster for, we now perform
    the deletion on the other nodes in two steps instead of a single step.
    
    - First send out the list of records to be deleted to all
      other nodes with the new RECEIVE_RECORDS control to store
      the lmaster's current empty copy.
    - Then send those records that could be deleted on all nodes
      to all nodes again with the TRY_DELETE_RECORDS control
      as before for deletion.
    - Finally delete those records locally that were successfully
      deleted remotely in the previous step.
    
    This fixes an old race where a recovery that hits the vacuum process
    square between the eyes can create gaps in the record's history and
    hence let the records resurrect. In the case of the locking.tdb,
    that could mean that a file that was already closed, was recorded as
    being open and locked again, so samba clients were locked out of that
    file until samba was restarted.
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit e397702e271af38204fd99733bbeba7c1db3a999
Author: Michael Adam <obnox at samba.org>
Date:   Fri Dec 21 00:24:47 2012 +0100

    vacuum: introduce the RECEIVE_RECORDS control
    
    This in preparation of turning the vacuming on the lmaster into
    into a two phase process:
    
    - First the node sends the list of records to be vacuumed
      to all other nodes with this new RECEIVE_RECORDS control.
      The remote nodes should store the lmaster's empty current copy.
    - Only those records that could be stored on all other nodes
      are processed further. They are send to all other nodes with
      the TRY_DELETE_RECORDS control as before for deletion.
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit e3740899c1af6962f93c85ad7d1cb71bddce45c6
Author: Michael Adam <obnox at samba.org>
Date:   Sat Dec 29 18:32:39 2012 +0100

    vacuum: reorder some of ctdb_process_delete_list() more intuitively
    
    Now that the nodemap and its talloc children don't hang off of the
    delete_records_list talloc context, we can build the nodemap
    and earlier, and move the construction of the delete_records_list
    to where it is more obvious what it is used for.
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit b7c3b8cdf92c597e621e3dae28b110d321de5ea8
Author: Michael Adam <obnox at samba.org>
Date:   Sat Dec 29 17:16:33 2012 +0100

    vacuum: add explicit temporary memory context to ctdb_process_delete_list()
    
    This removes the implicit artificial talloc hierarchy and makes the
    code easier to understand.
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit 59a887e12469266e514ad7d4e34810e7ea888ba3
Author: Michael Adam <obnox at samba.org>
Date:   Sat Jan 5 01:19:06 2013 +0100

    vacuum: fix indentation in ctdb_process_delete_list()
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit 11d728465a9c635e1829abaae17e2f7720433b69
Author: Michael Adam <obnox at samba.org>
Date:   Mon Dec 17 17:31:55 2012 +0100

    vacuum: free temporary allocated memory correctly in ctdb_process_delete_list().
    
    Add a common exit point for cleanup.
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit 3710dd0f313f551f1b302b4961e0203243e3d661
Author: Michael Adam <obnox at samba.org>
Date:   Mon Dec 17 17:26:22 2012 +0100

    vacuum: move variable into scope of use in ctdb_process_delete_list()
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit 4640979b526b6dac69a6a0555bfce75fe0206dac
Author: Michael Adam <obnox at samba.org>
Date:   Mon Dec 17 13:07:21 2012 +0100

    vacuum: move variable into scope of use in ctdb_process_delete_list()
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit f3e6e7f8ef22bd70dd2f101d818e2e5ab5ed3cd8
Author: Michael Adam <obnox at samba.org>
Date:   Mon Dec 17 13:03:42 2012 +0100

    vacuum: simplify ctdb_process_delete_list(): reduce indentation
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit 817c77a3d0a3546bf46389cec5f6b54778dd1693
Author: Michael Adam <obnox at samba.org>
Date:   Wed Apr 3 14:12:27 2013 +0200

    vacuum: add DEBUG to skip conditions in delete_record_traverse()
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit 3f7e35ff0db740cdcb6d27c43a59bb6ca6066efb
Author: Michael Adam <obnox at samba.org>
Date:   Fri Apr 5 17:14:43 2013 +0200

    vacuum: break line for RO-flags check in delete_record_traverse() for readability
    
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

commit e72a5e11845fe445baaee4730bb0bea8588ee9e3
Author: Michael Adam <obnox at samba.org>
Date:   Mon Apr 22 10:21:02 2013 -0400

    client: fix ctdb_control() to be able to cope with CTDB_CTRL_FLAG_NOREPLY
    
    This was apparently not used before in this context, and the bug hence
    not detected. It becomes necessary when ctdb_local_schedule_for_deletion()
    is called from a client ctdbd (the vacuuming child), hence needs to send
    the SCHEDULE_FOR_DELETION control to its parent.
    
    Pair-Programmed-With: Stefan Metzmacher <metze at samba.org>
    
    Signed-off-by: Stefan Metzmacher <metze at samba.org>
    Signed-off-by: Michael Adam <obnox at samba.org>
    Reviewed-By: Amitay Isaacs <amitay at gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 client/ctdb_client.c      |   11 +
 include/ctdb_private.h    |    1 +
 include/ctdb_protocol.h   |    5 +
 server/ctdb_call.c        |    2 +-
 server/ctdb_control.c     |    3 +
 server/ctdb_daemon.c      |    2 +-
 server/ctdb_ltdb_server.c |    4 +-
 server/ctdb_recover.c     |  202 ++++++++++++++++-
 server/ctdb_vacuum.c      |  567 ++++++++++++++++++++++++++++++++++-----------
 9 files changed, 658 insertions(+), 139 deletions(-)


Changeset truncated at 500 lines:

diff --git a/client/ctdb_client.c b/client/ctdb_client.c
index 76780b0..2ae8958 100644
--- a/client/ctdb_client.c
+++ b/client/ctdb_client.c
@@ -1140,6 +1140,17 @@ int ctdb_control(struct ctdb_context *ctdb, uint32_t destnode, uint64_t srvid,
 	state = ctdb_control_send(ctdb, destnode, srvid, opcode, 
 			flags, data, mem_ctx,
 			timeout, errormsg);
+
+	/* FIXME: Error conditions in ctdb_control_send return NULL without
+	 * setting errormsg.  So, there is no way to distinguish between sucess
+	 * and failure when CTDB_CTRL_FLAG_NOREPLY is set */
+	if (flags & CTDB_CTRL_FLAG_NOREPLY) {
+		if (status != NULL) {
+			*status = 0;
+		}
+		return 0;
+	}
+
 	return ctdb_control_recv(ctdb, state, mem_ctx, outdata, status, 
 			errormsg);
 }
diff --git a/include/ctdb_private.h b/include/ctdb_private.h
index 09f7dd9..03e996b 100644
--- a/include/ctdb_private.h
+++ b/include/ctdb_private.h
@@ -1253,6 +1253,7 @@ int32_t ctdb_control_get_tunable(struct ctdb_context *ctdb, TDB_DATA indata,
 int32_t ctdb_control_set_tunable(struct ctdb_context *ctdb, TDB_DATA indata);
 int32_t ctdb_control_list_tunables(struct ctdb_context *ctdb, TDB_DATA *outdata);
 int32_t ctdb_control_try_delete_records(struct ctdb_context *ctdb, TDB_DATA indata, TDB_DATA *outdata);
+int32_t ctdb_control_receive_records(struct ctdb_context *ctdb, TDB_DATA indata, TDB_DATA *outdata);
 int32_t ctdb_control_add_public_address(struct ctdb_context *ctdb, TDB_DATA indata);
 int32_t ctdb_control_del_public_address(struct ctdb_context *ctdb, TDB_DATA indata);
 
diff --git a/include/ctdb_protocol.h b/include/ctdb_protocol.h
index 751fe32..4755b4c 100644
--- a/include/ctdb_protocol.h
+++ b/include/ctdb_protocol.h
@@ -403,6 +403,7 @@ enum ctdb_controls {CTDB_CONTROL_PROCESS_EXISTS          = 0,
 		    CTDB_CONTROL_SET_DB_STICKY		 = 133,
 		    CTDB_CONTROL_RELOAD_PUBLIC_IPS	 = 134,
 		    CTDB_CONTROL_TRAVERSE_ALL_EXT	 = 135,
+		    CTDB_CONTROL_RECEIVE_RECORDS	 = 136,
 };
 
 /*
@@ -531,6 +532,10 @@ struct ctdb_ltdb_header {
 #define CTDB_REC_RO_HAVE_READONLY		0x02000000
 #define CTDB_REC_RO_REVOKING_READONLY		0x04000000
 #define CTDB_REC_RO_REVOKE_COMPLETE		0x08000000
+#define CTDB_REC_RO_FLAGS			(CTDB_REC_RO_HAVE_DELEGATIONS|\
+						 CTDB_REC_RO_HAVE_READONLY|\
+						 CTDB_REC_RO_REVOKING_READONLY|\
+						 CTDB_REC_RO_REVOKE_COMPLETE)
 	uint32_t flags;
 };
 
diff --git a/server/ctdb_call.c b/server/ctdb_call.c
index a6c6389..dbbebec 100644
--- a/server/ctdb_call.c
+++ b/server/ctdb_call.c
@@ -787,7 +787,7 @@ void ctdb_request_call(struct ctdb_context *ctdb, struct ctdb_req_header *hdr)
 	}
 
 	if (header.flags & CTDB_REC_RO_REVOKE_COMPLETE) {
-		header.flags &= ~(CTDB_REC_RO_HAVE_DELEGATIONS|CTDB_REC_RO_HAVE_READONLY|CTDB_REC_RO_REVOKING_READONLY|CTDB_REC_RO_REVOKE_COMPLETE);
+		header.flags &= ~CTDB_REC_RO_FLAGS;
 		CTDB_INCREMENT_STAT(ctdb, total_ro_revokes);
 		CTDB_INCREMENT_DB_STAT(ctdb_db, db_ro_revokes);
 		if (ctdb_ltdb_store(ctdb_db, call->key, &header, data) != 0) {
diff --git a/server/ctdb_control.c b/server/ctdb_control.c
index affb9dd..0d0f61c 100644
--- a/server/ctdb_control.c
+++ b/server/ctdb_control.c
@@ -654,6 +654,9 @@ static int32_t ctdb_control_dispatch(struct ctdb_context *ctdb,
 		CHECK_CONTROL_DATA_SIZE(0);
 		return ctdb_control_reload_public_ips(ctdb, c, async_reply);
 
+	case CTDB_CONTROL_RECEIVE_RECORDS:
+		return ctdb_control_receive_records(ctdb, indata, outdata);
+
 	default:
 		DEBUG(DEBUG_CRIT,(__location__ " Unknown CTDB control opcode %u\n", opcode));
 		return -1;
diff --git a/server/ctdb_daemon.c b/server/ctdb_daemon.c
index 2bd9346..c9799c3 100644
--- a/server/ctdb_daemon.c
+++ b/server/ctdb_daemon.c
@@ -688,7 +688,7 @@ static void daemon_request_call_from_client(struct ctdb_client *client,
 	}
 
 	if (header.flags & CTDB_REC_RO_REVOKE_COMPLETE) {
-		header.flags &= ~(CTDB_REC_RO_HAVE_DELEGATIONS|CTDB_REC_RO_HAVE_READONLY|CTDB_REC_RO_REVOKING_READONLY|CTDB_REC_RO_REVOKE_COMPLETE);
+		header.flags &= ~CTDB_REC_RO_FLAGS;
 		CTDB_INCREMENT_STAT(ctdb, total_ro_revokes);
 		CTDB_INCREMENT_DB_STAT(ctdb_db, db_ro_revokes);
 		if (ctdb_ltdb_store(ctdb_db, key, &header, data) != 0) {
diff --git a/server/ctdb_ltdb_server.c b/server/ctdb_ltdb_server.c
index c5d9be1..4f77934 100644
--- a/server/ctdb_ltdb_server.c
+++ b/server/ctdb_ltdb_server.c
@@ -82,7 +82,7 @@ static int ctdb_ltdb_store_server(struct ctdb_db_context *ctdb_db,
 	 */
 	if (data.dsize != 0) {
 		keep = true;
-	} else if (header->flags & (CTDB_REC_RO_HAVE_DELEGATIONS|CTDB_REC_RO_HAVE_READONLY|CTDB_REC_RO_REVOKING_READONLY|CTDB_REC_RO_REVOKE_COMPLETE)) {
+	} else if (header->flags & CTDB_REC_RO_FLAGS) {
 		keep = true;
 	} else if (ctdb_db->persistent) {
 		keep = true;
@@ -127,7 +127,7 @@ static int ctdb_ltdb_store_server(struct ctdb_db_context *ctdb_db,
 	if (keep) {
 		if (!ctdb_db->persistent &&
 		    (ctdb_db->ctdb->pnn == header->dmaster) &&
-		    !(header->flags & (CTDB_REC_RO_HAVE_DELEGATIONS|CTDB_REC_RO_HAVE_READONLY|CTDB_REC_RO_REVOKING_READONLY|CTDB_REC_RO_REVOKE_COMPLETE)))
+		    !(header->flags & CTDB_REC_RO_FLAGS))
 		{
 			header->rsn++;
 
diff --git a/server/ctdb_recover.c b/server/ctdb_recover.c
index 433a665..1e5170f 100644
--- a/server/ctdb_recover.c
+++ b/server/ctdb_recover.c
@@ -418,7 +418,7 @@ int32_t ctdb_control_push_db(struct ctdb_context *ctdb, TDB_DATA indata)
 		/* strip off any read only record flags. All readonly records
 		   are revoked implicitely by a recovery
 		*/
-		hdr->flags &= ~(CTDB_REC_RO_HAVE_DELEGATIONS|CTDB_REC_RO_HAVE_READONLY|CTDB_REC_RO_REVOKING_READONLY|CTDB_REC_RO_REVOKE_COMPLETE);
+		hdr->flags &= ~CTDB_REC_RO_FLAGS;
 
 		data.dptr += sizeof(*hdr);
 		data.dsize -= sizeof(*hdr);
@@ -843,13 +843,13 @@ static int delete_tdb_record(struct ctdb_context *ctdb, struct ctdb_db_context *
 	}
 
 	/* do not allow deleting record that have readonly flags set. */
-	if (hdr->flags & (CTDB_REC_RO_HAVE_DELEGATIONS|CTDB_REC_RO_HAVE_READONLY|CTDB_REC_RO_REVOKING_READONLY|CTDB_REC_RO_REVOKE_COMPLETE)) {
+	if (hdr->flags & CTDB_REC_RO_FLAGS) {
 		tdb_chainunlock(ctdb_db->ltdb->tdb, key);
 		DEBUG(DEBUG_INFO,(__location__ " Skipping record with readonly flags set\n"));
 		free(data.dptr);
 		return -1;		
 	}
-	if (hdr2->flags & (CTDB_REC_RO_HAVE_DELEGATIONS|CTDB_REC_RO_HAVE_READONLY|CTDB_REC_RO_REVOKING_READONLY|CTDB_REC_RO_REVOKE_COMPLETE)) {
+	if (hdr2->flags & CTDB_REC_RO_FLAGS) {
 		tdb_chainunlock(ctdb_db->ltdb->tdb, key);
 		DEBUG(DEBUG_INFO,(__location__ " Skipping record with readonly flags set\n"));
 		free(data.dptr);
@@ -1092,6 +1092,202 @@ int32_t ctdb_control_try_delete_records(struct ctdb_context *ctdb, TDB_DATA inda
 	return 0;
 }
 
+/**
+ * Store a record as part of the vacuum process:
+ * This is called from the RECEIVE_RECORD control which
+ * the lmaster uses to send the current empty copy
+ * to all nodes for storing, before it lets the other
+ * nodes delete the records in the second phase with
+ * the TRY_DELETE_RECORDS control.
+ *
+ * Only store if we are not lmaster or dmaster, and our
+ * rsn is <= the provided rsn. Use non-blocking locks.
+ *
+ * return 0 if the record was successfully stored.
+ * return !0 if the record still exists in the tdb after returning.
+ */
+static int store_tdb_record(struct ctdb_context *ctdb,
+			    struct ctdb_db_context *ctdb_db,
+			    struct ctdb_rec_data *rec)
+{
+	TDB_DATA key, data, data2;
+	struct ctdb_ltdb_header *hdr, *hdr2;
+	int ret;
+
+	key.dsize = rec->keylen;
+	key.dptr = &rec->data[0];
+	data.dsize = rec->datalen;
+	data.dptr = &rec->data[rec->keylen];
+
+	if (ctdb_lmaster(ctdb, &key) == ctdb->pnn) {
+		DEBUG(DEBUG_INFO, (__location__ " Called store_tdb_record "
+				   "where we are lmaster\n"));
+		return -1;
+	}
+
+	if (data.dsize != sizeof(struct ctdb_ltdb_header)) {
+		DEBUG(DEBUG_ERR, (__location__ " Bad record size\n"));
+		return -1;
+	}
+
+	hdr = (struct ctdb_ltdb_header *)data.dptr;
+
+	/* use a non-blocking lock */
+	if (tdb_chainlock_nonblock(ctdb_db->ltdb->tdb, key) != 0) {
+		DEBUG(DEBUG_ERR, (__location__ " Failed to lock chain\n"));
+		return -1;
+	}
+
+	data2 = tdb_fetch(ctdb_db->ltdb->tdb, key);
+	if (data2.dptr == NULL || data2.dsize < sizeof(struct ctdb_ltdb_header)) {
+		tdb_store(ctdb_db->ltdb->tdb, key, data, 0);
+		DEBUG(DEBUG_INFO, (__location__ " Stored record\n"));
+		ret = 0;
+		goto done;
+	}
+
+	hdr2 = (struct ctdb_ltdb_header *)data.dptr;
+
+	if (hdr2->rsn > hdr->rsn) {
+		DEBUG(DEBUG_INFO, (__location__ " Skipping record with "
+				   "rsn=%llu - called with rsn=%llu\n",
+				   (unsigned long long)hdr2->rsn,
+				   (unsigned long long)hdr->rsn));
+		ret = -1;
+		goto done;
+	}
+
+	/* do not allow vacuuming of records that have readonly flags set. */
+	if (hdr->flags & CTDB_REC_RO_FLAGS) {
+		DEBUG(DEBUG_INFO,(__location__ " Skipping record with readonly "
+				  "flags set\n"));
+		ret = -1;
+		goto done;
+	}
+	if (hdr2->flags & CTDB_REC_RO_FLAGS) {
+		DEBUG(DEBUG_INFO,(__location__ " Skipping record with readonly "
+				  "flags set\n"));
+		ret = -1;
+		goto done;
+	}
+
+	if (hdr2->dmaster == ctdb->pnn) {
+		DEBUG(DEBUG_INFO, (__location__ " Attempted to store record "
+				   "where we are the dmaster\n"));
+		ret = -1;
+		goto done;
+	}
+
+	if (tdb_store(ctdb_db->ltdb->tdb, key, data, 0) != 0) {
+		DEBUG(DEBUG_INFO,(__location__ " Failed to store record\n"));
+		ret = -1;
+		goto done;
+	}
+
+	ret = 0;
+
+done:
+	tdb_chainunlock(ctdb_db->ltdb->tdb, key);
+	free(data2.dptr);
+	return  ret;
+}
+
+
+
+/**
+ * Try to store all these records as part of the vacuuming process
+ * and return the records we failed to store.
+ */
+int32_t ctdb_control_receive_records(struct ctdb_context *ctdb,
+				     TDB_DATA indata, TDB_DATA *outdata)
+{
+	struct ctdb_marshall_buffer *reply = (struct ctdb_marshall_buffer *)indata.dptr;
+	struct ctdb_db_context *ctdb_db;
+	int i;
+	struct ctdb_rec_data *rec;
+	struct ctdb_marshall_buffer *records;
+
+	if (indata.dsize < offsetof(struct ctdb_marshall_buffer, data)) {
+		DEBUG(DEBUG_ERR,
+		      (__location__ " invalid data in receive_records\n"));
+		return -1;
+	}
+
+	ctdb_db = find_ctdb_db(ctdb, reply->db_id);
+	if (!ctdb_db) {
+		DEBUG(DEBUG_ERR, (__location__ " Unknown db 0x%08x\n",
+				  reply->db_id));
+		return -1;
+	}
+
+	DEBUG(DEBUG_DEBUG, ("starting receive_records of %u records for "
+			    "dbid 0x%x\n", reply->count, reply->db_id));
+
+	/* create a blob to send back the records we could not store */
+	records = (struct ctdb_marshall_buffer *)
+			talloc_zero_size(outdata,
+				offsetof(struct ctdb_marshall_buffer, data));
+	if (records == NULL) {
+		DEBUG(DEBUG_ERR, (__location__ " Out of memory\n"));
+		return -1;
+	}
+	records->db_id = ctdb_db->db_id;
+
+	rec = (struct ctdb_rec_data *)&reply->data[0];
+	for (i=0; i<reply->count; i++) {
+		TDB_DATA key, data;
+
+		key.dptr = &rec->data[0];
+		key.dsize = rec->keylen;
+		data.dptr = &rec->data[key.dsize];
+		data.dsize = rec->datalen;
+
+		if (data.dsize < sizeof(struct ctdb_ltdb_header)) {
+			DEBUG(DEBUG_CRIT, (__location__ " bad ltdb record "
+					   "in indata\n"));
+			return -1;
+		}
+
+		/*
+		 * If we can not store the record we must add it to the reply
+		 * so the lmaster knows it may not purge this record.
+		 */
+		if (store_tdb_record(ctdb, ctdb_db, rec) != 0) {
+			size_t old_size;
+			struct ctdb_ltdb_header *hdr;
+
+			hdr = (struct ctdb_ltdb_header *)data.dptr;
+			data.dptr += sizeof(*hdr);
+			data.dsize -= sizeof(*hdr);
+
+			DEBUG(DEBUG_INFO, (__location__ " Failed to store "
+					   "record with hash 0x%08x in vacuum "
+					   "via RECEIVE_RECORDS\n",
+					   ctdb_hash(&key)));
+
+			old_size = talloc_get_size(records);
+			records = talloc_realloc_size(outdata, records,
+						      old_size + rec->length);
+			if (records == NULL) {
+				DEBUG(DEBUG_ERR, (__location__ " Failed to "
+						  "expand\n"));
+				return -1;
+			}
+			records->count++;
+			memcpy(old_size+(uint8_t *)records, rec, rec->length);
+		}
+
+		rec = (struct ctdb_rec_data *)(rec->length + (uint8_t *)rec);
+	}
+
+
+	outdata->dptr = (uint8_t *)records;
+	outdata->dsize = talloc_get_size(records);
+
+	return 0;
+}
+
+
 /*
   report capabilities
  */
diff --git a/server/ctdb_vacuum.c b/server/ctdb_vacuum.c
index 4a000b0..d7527d4 100644
--- a/server/ctdb_vacuum.c
+++ b/server/ctdb_vacuum.c
@@ -2,7 +2,7 @@
    ctdb vacuuming events
 
    Copyright (C) Ronnie Sahlberg  2009
-   Copyright (C) Michael Adam 2010-2011
+   Copyright (C) Michael Adam 2010-2013
    Copyright (C) Stefan Metzmacher 2010-2011
 
    This program is free software; you can redistribute it and/or modify
@@ -96,6 +96,7 @@ struct delete_record_data {
 
 struct delete_records_list {
 	struct ctdb_marshall_buffer *records;
+	struct vacuum_data *vdata;
 };
 
 /**
@@ -303,6 +304,141 @@ static int delete_marshall_traverse(void *param, void *data)
 }
 
 /**
+ * Variant of delete_marshall_traverse() that bumps the
+ * RSN of each traversed record in the database.
+ *
+ * This is needed to ensure that when rolling out our
+ * empty record copy before remote deletion, we as the
+ * record's dmaster keep a higher RSN than the non-dmaster
+ * nodes. This is needed to prevent old copies from
+ * resurrection in recoveries.
+ */
+static int delete_marshall_traverse_first(void *param, void *data)
+{
+	struct delete_record_data *dd = talloc_get_type(data, struct delete_record_data);
+	struct delete_records_list *recs = talloc_get_type(param, struct delete_records_list);
+	struct ctdb_db_context *ctdb_db = dd->ctdb_db;
+	struct ctdb_context *ctdb = ctdb_db->ctdb;
+	struct ctdb_ltdb_header *header;
+	TDB_DATA tdb_data, ctdb_data;
+	uint32_t lmaster;
+	uint32_t hash = ctdb_hash(&(dd->key));
+	int res;
+
+	res = tdb_chainlock(ctdb_db->ltdb->tdb, dd->key);
+	if (res != 0) {
+		DEBUG(DEBUG_ERR,
+		      (__location__ " Error getting chainlock on record with "
+		       "key hash [0x%08x] on database db[%s].\n",
+		       hash, ctdb_db->db_name));
+		recs->vdata->delete_skipped++;
+		talloc_free(dd);
+		return 0;
+	}
+
+	/*
+	 * Verify that the record is still empty, its RSN has not
+	 * changed and that we are still its lmaster and dmaster.
+	 */
+
+	tdb_data = tdb_fetch(ctdb_db->ltdb->tdb, dd->key);
+	if (tdb_data.dsize < sizeof(struct ctdb_ltdb_header)) {
+		DEBUG(DEBUG_INFO, (__location__ ": record with hash [0x%08x] "
+				   "on database db[%s] does not exist or is not"
+				   " a ctdb-record.  skipping.\n",
+				   hash, ctdb_db->db_name));
+		goto skip;
+	}
+
+	if (tdb_data.dsize > sizeof(struct ctdb_ltdb_header)) {
+		DEBUG(DEBUG_INFO, (__location__ ": record with hash [0x%08x] "
+				   "on database db[%s] has been recycled. "
+				   "skipping.\n",
+				   hash, ctdb_db->db_name));
+		goto skip;
+	}
+
+	header = (struct ctdb_ltdb_header *)tdb_data.dptr;
+
+	if (header->flags & CTDB_REC_RO_FLAGS) {
+		DEBUG(DEBUG_INFO, (__location__ ": record with hash [0x%08x] "
+				   "on database db[%s] has read-only flags. "
+				   "skipping.\n",
+				   hash, ctdb_db->db_name));
+		goto skip;
+	}
+
+	if (header->dmaster != ctdb->pnn) {
+		DEBUG(DEBUG_INFO, (__location__ ": record with hash [0x%08x] "
+				   "on database db[%s] has been migrated away. "
+				   "skipping.\n",
+				   hash, ctdb_db->db_name));
+		goto skip;
+	}
+
+	if (header->rsn != dd->hdr.rsn) {
+		DEBUG(DEBUG_INFO, (__location__ ": record with hash [0x%08x] "
+				   "on database db[%s] seems to have been "
+				   "migrated away and back again (with empty "
+				   "data). skipping.\n",
+				   hash, ctdb_db->db_name));
+		goto skip;
+	}
+
+	lmaster = ctdb_lmaster(ctdb_db->ctdb, &dd->key);
+
+	if (lmaster != ctdb->pnn) {
+		DEBUG(DEBUG_INFO, (__location__ ": not lmaster for record in "
+				   "delete list (key hash [0x%08x], db[%s]). "
+				   "Strange! skipping.\n",
+				   hash, ctdb_db->db_name));
+		goto skip;
+	}
+
+	/*
+	 * Increment the record's RSN to ensure the dmaster (i.e. the current
+	 * node) has the highest RSN of the record in the cluster.
+	 * This is to prevent old record copies from resurrecting in recoveries
+	 * if something should fail during the deletion process.
+	 * Note that ctdb_ltdb_store_server() increments the RSN if called
+	 * on the record's dmaster.
+	 */
+
+	ctdb_data.dptr = tdb_data.dptr + sizeof(struct ctdb_ltdb_header);
+	ctdb_data.dsize = tdb_data.dsize - sizeof(struct ctdb_ltdb_header);
+
+	res = ctdb_ltdb_store(ctdb_db, dd->key, header, ctdb_data);
+	if (res != 0) {
+		DEBUG(DEBUG_ERR, (__location__ ": Failed to store record with "
+				  "key hash [0x%08x] on database db[%s].\n",
+				  hash, ctdb_db->db_name));
+		goto skip;
+	}
+
+	tdb_chainunlock(ctdb_db->ltdb->tdb, dd->key);
+
+	goto done;
+
+skip:
+	tdb_chainunlock(ctdb_db->ltdb->tdb, dd->key);
+
+	recs->vdata->delete_skipped++;
+	talloc_free(dd);
+	dd = NULL;
+
+done:
+	if (tdb_data.dptr != NULL) {


-- 
CTDB repository


More information about the samba-cvs mailing list