[SCM] CTDB repository - branch master updated - ctdb-1.10-87-ge075670

Ronnie Sahlberg sahlberg at samba.org
Thu Feb 17 17:06:02 MST 2011


The branch, master has been updated
       via  e075670dee8e6ecaba54986f87a85be3d0528b6b (commit)
       via  7db5a4832a9555be53c301f198f72b9e075a8ae7 (commit)
       via  0c030c9384500f340d8382c20e1e91b11aa377e9 (commit)
       via  155dd1f4885fe142c6f8bd09430f65daf8a17e51 (commit)
       via  b86feb6fe463dfdb67b2798491df18a4c434a430 (commit)
      from  307e5e95548155a31682dfcb0956834d0c85838e (commit)

http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit e075670dee8e6ecaba54986f87a85be3d0528b6b
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date:   Mon Dec 6 13:08:53 2010 +1100

    Add two new flags for the ltdb header.
    One of which signals that the record has never been migrated to/from a node
    while containing data.
    This property "has never been migrated while non-zero" is important later
    to provide heuristics on which records we might be able to purge
    from the tdb files cheaply, i.e. without having to rely on the full-blown
    database vacuum.
    
    These records are belived to be very common and the pattern would look like
    this :
    1, no record exists at all.
    2, client opens a file
    3, samba requests the record for this file
    4, an empty record is created on the LMASTER
    5, the empty record is migrated to the DMASTER
    6, samba writes a <sharemode> to the record locally and the record grows
    7, client finishes working the file and closes the file
    8, samba removes the sharemode and the record becomes empty again.
    9, much later : vacuuming will delete the record
    
    At stage 8, since the record has never been migrated onto a node wile being
    non-zero it would be safe, and much more efficient to just delete the record
    completely from the database and hand it back to the LMASTER.
    
    The flags occupy the same uint32_t as was previously used for laccessor/lacount
    in the header. For now, make sure the flags only define/use the top 16 bits
    of this field so that we are sure we dont collide with bits set to one
    from previous generations of the ctdb cluster database prior to this
    change in semantics of this word.
    
    This is a rework of Michaels patch :
    commit 2af1a47cbe1a608496c8caf3eb0c990eb7259a0d
    Author: Michael Adam <obnox at samba.org>
    Date:   Tue Nov 30 17:00:54 2010 +0100
    
        add a DEFAULT record flag and a MIGRATED_WITH_DATA record flag.

commit 7db5a4832a9555be53c301f198f72b9e075a8ae7
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date:   Mon Oct 18 11:57:38 2010 +1100

    remove checking for filesystems and filesystem health from the cnfs script.
    remove the gpfsmount and gpfsumount entry points

commit 0c030c9384500f340d8382c20e1e91b11aa377e9
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date:   Fri Jan 21 10:56:56 2011 +1100

    60.nfs
    Dont update the statd settings that often.
    When we have very many nodes and very many ips, this would generate
    a lot of unnessecary load on the system

commit 155dd1f4885fe142c6f8bd09430f65daf8a17e51
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date:   Mon Nov 29 13:07:59 2010 +1100

    Remove LACOUNT and LACCESSOR and migrate the records immediately.
    
    This concept didnt work out and it is really just as expensive as a full migration
    anyway, without the benefit of caching the data for subsequence accesses.
    
    Now, migrate the records immediately on first access.
    This will be combined with a "cheap vacuum-lite" for special empty records to
    prevent growth of databases.
    
    Later extensions to mimic read-only behaviour of records will include proper shared read-only locking of database records, making the laccessor/lacount read-only access to the data obsolete anyway.
    
    By removing this special case and handling of lacount laccessor makes the codapath where shared read-only locking will be be implemented simpler, and frees up space in the ctdb_ltdb header for use by vacuuming flags as well as read-only locking flags.

commit b86feb6fe463dfdb67b2798491df18a4c434a430
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date:   Fri Oct 8 13:14:14 2010 +1100

    change the hash function to use the much better Jenkins hash
    from the tdb library
    
    cq S1020233

-----------------------------------------------------------------------

Summary of changes:
 client/ctdb_client.c    |   15 ++-------
 common/ctdb_ltdb.c      |    1 -
 common/ctdb_util.c      |    9 +-----
 config/events.d/60.nfs  |    4 +-
 config/events.d/62.cnfs |   74 -----------------------------------------------
 include/ctdb_client.h   |    5 ---
 include/ctdb_private.h  |    5 +--
 include/ctdb_protocol.h |    6 ++-
 server/ctdb_call.c      |   19 +++++-------
 server/ctdb_tunables.c  |    1 -
 10 files changed, 21 insertions(+), 118 deletions(-)


Changeset truncated at 500 lines:

diff --git a/client/ctdb_client.c b/client/ctdb_client.c
index 5a07a85..99ff72d 100644
--- a/client/ctdb_client.c
+++ b/client/ctdb_client.c
@@ -72,7 +72,7 @@ struct ctdb_req_header *_ctdbd_allocate_pkt(struct ctdb_context *ctdb,
 */
 int ctdb_call_local(struct ctdb_db_context *ctdb_db, struct ctdb_call *call,
 		    struct ctdb_ltdb_header *header, TALLOC_CTX *mem_ctx,
-		    TDB_DATA *data, uint32_t caller)
+		    TDB_DATA *data)
 {
 	struct ctdb_call_info *c;
 	struct ctdb_registered_call *fn;
@@ -105,15 +105,8 @@ int ctdb_call_local(struct ctdb_db_context *ctdb_db, struct ctdb_call *call,
 		return -1;
 	}
 
-	if (header->laccessor != caller) {
-		header->lacount = 0;
-	}
-	header->laccessor = caller;
-	header->lacount++;
-
-	/* we need to force the record to be written out if this was a remote access,
-	   so that the lacount is updated */
-	if (c->new_data == NULL && header->laccessor != ctdb->pnn) {
+	/* we need to force the record to be written out if this was a remote access */
+	if (c->new_data == NULL) {
 		c->new_data = &c->record_data;
 	}
 
@@ -368,7 +361,7 @@ static struct ctdb_client_call_state *ctdb_client_call_local_send(struct ctdb_db
 	*(state->call) = *call;
 	state->ctdb_db = ctdb_db;
 
-	ret = ctdb_call_local(ctdb_db, state->call, header, state, data, ctdb->pnn);
+	ret = ctdb_call_local(ctdb_db, state->call, header, state, data);
 
 	return state;
 }
diff --git a/common/ctdb_ltdb.c b/common/ctdb_ltdb.c
index 3572371..200cca4 100644
--- a/common/ctdb_ltdb.c
+++ b/common/ctdb_ltdb.c
@@ -65,7 +65,6 @@ static void ltdb_initial_header(struct ctdb_db_context *ctdb_db,
 	ZERO_STRUCTP(header);
 	/* initial dmaster is the lmaster */
 	header->dmaster = ctdb_lmaster(ctdb_db->ctdb, &key);
-	header->laccessor = header->dmaster;
 }
 
 
diff --git a/common/ctdb_util.c b/common/ctdb_util.c
index 88741e3..1ff4c1f 100644
--- a/common/ctdb_util.c
+++ b/common/ctdb_util.c
@@ -99,14 +99,7 @@ bool ctdb_same_address(struct ctdb_address *a1, struct ctdb_address *a2)
 */
 uint32_t ctdb_hash(const TDB_DATA *key)
 {
-	uint32_t value;	/* Used to compute the hash value.  */
-	uint32_t i;	/* Used to cycle through random values. */
-
-	/* Set the initial value from the key size. */
-	for (value = 0x238F13AF * key->dsize, i=0; i < key->dsize; i++)
-		value = (value + (key->dptr[i] << (i*5 % 24)));
-
-	return (1103515243 * value + 12345);  
+	return tdb_jenkins_hash(discard_const(key));
 }
 
 /*
diff --git a/config/events.d/60.nfs b/config/events.d/60.nfs
index 79a071b..0cea531 100755
--- a/config/events.d/60.nfs
+++ b/config/events.d/60.nfs
@@ -179,11 +179,11 @@ case "$1" in
 		$cmd &
 	}
 
-	# once every 60 seconds, update the statd state database for which
+	# once every 600 seconds, update the statd state database for which
 	# clients need notifications
 	LAST_UPDATE=`stat --printf="%Y" $CTDB_VARDIR/state/statd/update-trigger 2>/dev/null`
 	CURRENT_TIME=`date +"%s"`
-	[ $CURRENT_TIME -ge $(($LAST_UPDATE + 60)) ] && {
+	[ $CURRENT_TIME -ge $(($LAST_UPDATE + 600)) ] && {
 	    mkdir -p $CTDB_VARDIR/state/statd
 	    touch $CTDB_VARDIR/state/statd/update-trigger
 	    $CTDB_BASE/statd-callout updatelocal &
diff --git a/config/events.d/62.cnfs b/config/events.d/62.cnfs
index e0af722..af4ecc3 100755
--- a/config/events.d/62.cnfs
+++ b/config/events.d/62.cnfs
@@ -8,20 +8,8 @@ loadconfig
 STATEDIR=$CTDB_VARDIR/state/gpfs
 
 
-# filesystems needed by nfs
-NFS_FSS=`cat /etc/exports | egrep -v "^#" | sed -e "s/[ \t]*[^ \t]*$//" -e "s/\"//g"`
-
-
-
 check_if_healthy() {
         mkdir -p $STATEDIR/fs
-        FS=`(cd $STATEDIR/fs ; ls )`
-        [ -z "$FS" ] || {
-                MISSING=`echo $FS | sed -e "s/@/\//g"`
-                logger Filesystems required for NFS are missing. Node is UNHEALTHY. [$MISSING]
-                $CTDB_BASE/events.d/62.cnfs unhealthy "GPFS filesystems required for NFS are not mounted : [$MISSING]"
-                exit 0
-        }
 
         [ -f "$STATEDIR/gpfsnoquorum" ] && {
                 logger No GPFS quorum. Node is UNHEALTHY
@@ -40,64 +28,6 @@ case "$1" in
         ;;
 
 
-    # This event is called from the GPFS callbacks when a filesystem is
-    # unmounted
-    gpfsumount)
-        # is this a filesystem we need for nfs?
-        echo "$NFS_FSS" | egrep "^$2" >/dev/null || {
-                # no
-                exit 0
-        }
-
-        logger "GPFS unmounted filesystem $2 used by NFS. Mark node as UNHEALTHY"
-
-        MFS=`echo $2 | sed -e "s/\//@/g"`
-        mkdir -p $STATEDIR/fs
-        touch "$STATEDIR/fs/$MFS"
-        $CTDB_BASE/events.d/62.cnfs unhealthy "GPFS unmounted filesystem $2 used by NFS"
-        ;;
-
-    # This event is called from the GPFS callbacks when a filesystem is
-    # mounted
-    gpfsmount)
-        # is this a filesystem we need for nfs?
-        echo "$NFS_FSS" | egrep "^$2" >/dev/null || {
-                # no
-                exit 0
-        }
-
-        logger "GPFS mounted filesystem $2 used by NFS."
-
-        MFS=`echo $2 | sed -e "s/\//@/g"`
-        mkdir -p $STATEDIR/fs
-        rm -f "$STATEDIR/fs/$MFS"
-
-        check_if_healthy
-        ;;
-
-
-
-    # This event is called from the gpfs callback when GPFS is being shutdown.
-    gpfsshutdown)
-        logger "GPFS is shutting down. Marking node as UNHEALTHY and trigger a CTDB failover"
-        $CTDB_BASE/events.d/62.cnfs unhealthy "GPFS was shut down!"
-        ;;
-
-
-    # This event is called from the gpfs callback when GPFS has started.
-    # It checks that all required NFS filesystems are mounted 
-    # and flags the node healthy if so.
-    gpfsstartup)
-	# assume we always have quorum when starting
-	# we are only interested in the case when we explicitely
-	# lost quorum in an otherwise happy cluster
-        mkdir -p $STATEDIR
-        rm -f "$STATEDIR/gpfsnoquorum"
-        logger "GPFS is is started."
-        check_if_healthy
-        ;;
-
-
     gpfsquorumreached)
         mkdir -p $STATEDIR
         rm -f "$STATEDIR/gpfsnoquorum"
@@ -112,10 +42,6 @@ case "$1" in
         $CTDB_BASE/events.d/62.cnfs unhealthy "GPFS quorum was lost! Marking node as UNHEALTHY."
         ;;
 
-
-
-
-
     unhealthy)
         # Mark the node as UNHEALTHY which means all public addresses
         # will be migrated off the node.
diff --git a/include/ctdb_client.h b/include/ctdb_client.h
index aa9b2c0..3dc115f 100644
--- a/include/ctdb_client.h
+++ b/include/ctdb_client.h
@@ -77,11 +77,6 @@ int ctdb_set_tdb_dir_state(struct ctdb_context *ctdb, const char *dir);
 void ctdb_set_flags(struct ctdb_context *ctdb, unsigned flags);
 
 /*
-  set max acess count before a dmaster migration
-*/
-void ctdb_set_max_lacount(struct ctdb_context *ctdb, unsigned count);
-
-/*
   tell ctdb what address to listen on, in transport specific format
 */
 int ctdb_set_address(struct ctdb_context *ctdb, const char *address);
diff --git a/include/ctdb_private.h b/include/ctdb_private.h
index c189a5f..3f36870 100644
--- a/include/ctdb_private.h
+++ b/include/ctdb_private.h
@@ -82,7 +82,6 @@ struct ctdb_tunable {
 	uint32_t traverse_timeout;
 	uint32_t keepalive_interval;
 	uint32_t keepalive_limit;
-	uint32_t max_lacount;
 	uint32_t recover_timeout;
 	uint32_t recover_interval;
 	uint32_t election_timeout;
@@ -776,8 +775,8 @@ struct ctdb_call_state *ctdb_daemon_call_send_remote(struct ctdb_db_context *ctd
 						     struct ctdb_ltdb_header *header);
 
 int ctdb_call_local(struct ctdb_db_context *ctdb_db, struct ctdb_call *call,
-		    struct ctdb_ltdb_header *header, TALLOC_CTX *mem_ctx, TDB_DATA *data,
-		    uint32_t caller);
+		    struct ctdb_ltdb_header *header, TALLOC_CTX *mem_ctx,
+		    TDB_DATA *data);
 
 #define ctdb_reqid_find(ctdb, reqid, type)	(type *)_ctdb_reqid_find(ctdb, reqid, #type, __location__)
 
diff --git a/include/ctdb_protocol.h b/include/ctdb_protocol.h
index baf1790..b6b753c 100644
--- a/include/ctdb_protocol.h
+++ b/include/ctdb_protocol.h
@@ -479,8 +479,10 @@ enum ctdb_trans2_commit_error {
 struct ctdb_ltdb_header {
 	uint64_t rsn;
 	uint32_t dmaster;
-	uint32_t laccessor;
-	uint32_t lacount;
+	uint32_t reserved1;
+#define CTDB_REC_FLAG_DEFAULT			0x00000000
+#define CTDB_REC_FLAG_MIGRATED_WITH_DATA	0x00010000
+	uint32_t flags;
 };
 
 
diff --git a/server/ctdb_call.c b/server/ctdb_call.c
index c5f7e7d..d6c0866 100644
--- a/server/ctdb_call.c
+++ b/server/ctdb_call.c
@@ -297,7 +297,7 @@ static void ctdb_become_dmaster(struct ctdb_db_context *ctdb_db,
 		return;
 	}
 
-	ctdb_call_local(ctdb_db, state->call, &header, state, &data, ctdb->pnn);
+	ctdb_call_local(ctdb_db, state->call, &header, state, &data);
 
 	ret = ctdb_ltdb_unlock(ctdb_db, state->call->key);
 	if (ret != 0) {
@@ -465,14 +465,11 @@ void ctdb_request_call(struct ctdb_context *ctdb, struct ctdb_req_header *hdr)
 
 	CTDB_UPDATE_STAT(ctdb, max_hop_count, c->hopcount);
 
-	/* if this nodes has done enough consecutive calls on the same record
-	   then give them the record
-	   or if the node requested an immediate migration
-	*/
-	if ( c->hdr.srcnode != ctdb->pnn &&
-	     ((header.laccessor == c->hdr.srcnode
-	       && header.lacount >= ctdb->tunable.max_lacount)
-	      || (c->flags & CTDB_IMMEDIATE_MIGRATION)) ) {
+	/* Try if possible to migrate the record off to the caller node.
+	 * From the clients perspective a fetch of the data is just as 
+	 * expensive as a migration.
+	 */
+	if (c->hdr.srcnode != ctdb->pnn) {
 		if (ctdb_db->transaction_active) {
 			DEBUG(DEBUG_INFO, (__location__ " refusing migration"
 			      " of key %s while transaction is active\n",
@@ -491,7 +488,7 @@ void ctdb_request_call(struct ctdb_context *ctdb, struct ctdb_req_header *hdr)
 		}
 	}
 
-	ctdb_call_local(ctdb_db, call, &header, hdr, &data, c->hdr.srcnode);
+	ctdb_call_local(ctdb_db, call, &header, hdr, &data);
 
 	ret = ctdb_ltdb_unlock(ctdb_db, call->key);
 	if (ret != 0) {
@@ -707,7 +704,7 @@ struct ctdb_call_state *ctdb_call_local_send(struct ctdb_db_context *ctdb_db,
 	*(state->call) = *call;
 	state->ctdb_db = ctdb_db;
 
-	ret = ctdb_call_local(ctdb_db, state->call, header, state, data, ctdb->pnn);
+	ret = ctdb_call_local(ctdb_db, state->call, header, state, data);
 
 	event_add_timed(ctdb->ev, state, timeval_zero(), call_local_trigger, state);
 
diff --git a/server/ctdb_tunables.c b/server/ctdb_tunables.c
index 47694b7..4cd1b45 100644
--- a/server/ctdb_tunables.c
+++ b/server/ctdb_tunables.c
@@ -30,7 +30,6 @@ static const struct {
 	{ "TraverseTimeout",     20, offsetof(struct ctdb_tunable, traverse_timeout) },
 	{ "KeepaliveInterval",    5,  offsetof(struct ctdb_tunable, keepalive_interval) },
 	{ "KeepaliveLimit",       5,  offsetof(struct ctdb_tunable, keepalive_limit) },
-	{ "MaxLACount",           7,  offsetof(struct ctdb_tunable, max_lacount) },
 	{ "RecoverTimeout",      20,  offsetof(struct ctdb_tunable, recover_timeout) },
 	{ "RecoverInterval",      1,  offsetof(struct ctdb_tunable, recover_interval) },
 	{ "ElectionTimeout",      3,  offsetof(struct ctdb_tunable, election_timeout) },


-- 
CTDB repository


More information about the samba-cvs mailing list