[SCM] CTDB repository - branch master updated - c1130e58296e63be3787ec59690941b2677a3378

Mon Mar 31 04:29:00 GMT 2008

The branch, master has been updated
       via  c1130e58296e63be3787ec59690941b2677a3378 (commit)
       via  f8294d103fdd8a720d0b0c337d3973c7fdf76b5c (commit)
       via  be89005bd5d13409e377d425db2aad1c0d5b3826 (commit)
       via  a0c9a451afde0c99efdc92e1fd418991bb81fa2b (commit)
       via  8477f6a079e2beb8c09c19702733c4e17f5032fe (commit)
       via  4becf32aea088a25686e8bc330eb47d85ae0ef8f (commit)
       via  53c4f483bb122e6fa13abcc6d4584130f20af461 (commit)
       via  bc9c4f0d52e9b06aceb08cea99ed3fd20b44616c (commit)
       via  9e625ece19a91f362c9539fa73b6b2108f0d9c53 (commit)
       via  89529ea81379335b3db09774d192fb7cefe37338 (commit)
       via  4f7f8aa6f178115b551ac35f7df2ec5aad054fe2 (commit)
       via  646f4d9a01637685e967fb3ecc042fc97c0b7529 (commit)
       via  446e2f4e650b12d6fce5677a6841006462c23dba (commit)
       via  61fd50e2b3aa9a3ed32bc81a8e28464f267dc490 (commit)
       via  f3648a8a5b3934ea42c7d2550f729a5bd61a4d0f (commit)
       via  ffee062b7e26a6aa6ad254edb58399040ecaa542 (commit)
       via  d3b8a461b15bc584fa1785eb5922de6d49d8f6c4 (commit)
       via  fdaf7cb2d7682507fbf4c6c2b833b327c93fac08 (commit)
       via  06d3ce470766ef0b60d68ccd84de5437146cc147 (commit)
       via  07af425f444531942cce8abff112c1524228d287 (commit)
       via  b7f955338f50c92374b4f559268fb3a1a516aefa (commit)
       via  bf1863cc9e2539b2c3e53c664b493b459ebfcc8b (commit)
       via  8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436 (commit)
       via  21d3319eaf463e2a00637d440ee2d4d15f53bf09 (commit)
       via  9effb22cc1616d684352d7ebabb359e69adb0f52 (commit)
       via  c20293360db67f9876b0c84e5e9e12a5868964cb (commit)
       via  613881a06186dec90fb64a7190ddf4afd7437d67 (commit)
       via  8df75775966ead36e1073896fedeff674a6e0587 (commit)
       via  b93d29f43f5306c244c887b54a77bca8a061daf2 (commit)
       via  35627c7450a03f36a353c3dd7cce31ce3433a7ff (commit)
       via  bb8229d7b479bd486b07fa6cd04100fec02bddee (commit)
       via  6272ad33b4af6ea9d6fd0ac877df3f75be45d665 (commit)
      from  4bc20ac73e9fa94ffd43cccb6eeb438eeff9963c (commit)

http://gitweb.samba.org/?p=tridge/ctdb.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit c1130e58296e63be3787ec59690941b2677a3378
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Mon Mar 31 11:00:08 2008 +1100

    update the iscis support under RHEL5 to allow one iscsi target to be defined for each public address in the cluster.
    
    update the documentation for iscsi

commit f8294d103fdd8a720d0b0c337d3973c7fdf76b5c
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Thu Mar 27 09:23:27 2008 +1100

    Add two new controls to add/delete public ip address from a node at runtime.
    
    The controls only modify the runtime setting of which public addresses a node
    can server and does not modify /etc/ctdb/public_addresses.
    To make the change permanent you also need to edit /etc/ctdb/public_addresses
    manually.
    
    After ip addresses have been added/deleted you need to invoke a recovery
    for the ip addresses to be redistributed.

commit be89005bd5d13409e377d425db2aad1c0d5b3826
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Tue Mar 25 11:11:13 2008 +1100

    fix a memory leak
    
    allocate the memory to the 'call' context and not off the 'ctdb' context

commit a0c9a451afde0c99efdc92e1fd418991bb81fa2b
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Tue Mar 25 09:43:47 2008 +1100

    update to version 1.0.31

commit 8477f6a079e2beb8c09c19702733c4e17f5032fe
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Tue Mar 25 08:27:38 2008 +1100

    From M Dietz,
    Add back the controls to enable/disable monitoring we used to have for debugging but removed a while ago

commit 4becf32aea088a25686e8bc330eb47d85ae0ef8f
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Wed Mar 19 13:54:17 2008 +1100

    in ctdb_call_local() we can not talloc_steal() the returned data and hang it off ctdb.
    This can cause a memory leak if the call is terminated before we have managed to respond to the client.
    (and the call is talloc_free()d but the data is still hanging off ctdb)
    
    instead we must talloc_steal() the data and hang it off the call structure to avoid the memory leak.
    
    In order to do this we must also change the call structure that is passed into ctdb_call_local() to be allocated through talloc().
    
    This structure was previously either a static variable, or an element of a larger talloc()ed structure (ctdb_call_state or ctdb_client_call_state) so
    we must change all creations of a ctdb_call into explicitely creating it through talloc()

commit 53c4f483bb122e6fa13abcc6d4584130f20af461
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Wed Mar 19 12:08:29 2008 +1100

    dont steal reply_data.dptr to ctdb if there is no data, since then we would leak
    memory

commit bc9c4f0d52e9b06aceb08cea99ed3fd20b44616c
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Thu Mar 13 07:54:55 2008 +1100

    change the log level for the message when someone connects to a non-public ip

commit 9e625ece19a91f362c9539fa73b6b2108f0d9c53
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Thu Mar 13 07:53:29 2008 +1100

    Redo the vacukming process to mkake it scalable.
    
    Vacumming used to delete one record at a time on all nodes, that was
    m*n behaviour and would require a huge storm of ctdb->ctdb controls and just wouldnt scale at all.
    
    The new vacuming process collects all records to be deleted locally and then only sends 1 control to the other nodes. This control contains a list of all records to be deleted.

commit 89529ea81379335b3db09774d192fb7cefe37338
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Tue Mar 4 13:40:29 2008 +1100

    update to version 1.0.30

commit 4f7f8aa6f178115b551ac35f7df2ec5aad054fe2
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Tue Mar 4 13:29:48 2008 +1100

    Update ctdb uptime to provide machinereadable output

commit 646f4d9a01637685e967fb3ecc042fc97c0b7529
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Tue Mar 4 13:23:06 2008 +1100

    provide machinereadble -Y output for 'ctdb getdebug'

commit 446e2f4e650b12d6fce5677a6841006462c23dba
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Tue Mar 4 13:18:27 2008 +1100

    make 'ctdb ip' provide machinereadble output using '-Y'

commit 61fd50e2b3aa9a3ed32bc81a8e28464f267dc490
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Tue Mar 4 13:06:46 2008 +1100

    document some public tunables

commit f3648a8a5b3934ea42c7d2550f729a5bd61a4d0f
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Tue Mar 4 12:37:24 2008 +1100

    document some new ctdb command

commit ffee062b7e26a6aa6ad254edb58399040ecaa542
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Tue Mar 4 12:20:23 2008 +1100

    A new command to 'ctdb'
    
    ctdb moveip <IPADDRESS> <NODE>
    
    which can be used to manually fail an ip address over to a specific node.
    
    This can only be used if DeteministicIPs are disabled and also only if NoIPFailback is enabled.

commit d3b8a461b15bc584fa1785eb5922de6d49d8f6c4
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Mon Mar 3 12:52:16 2008 +1100

    add a new tunable 'NoIPFailback'
    when this tunable is set, ip addresses will only be failed over when a node
    fails. And only those ip addresses held by the failed node will be reallocated
    in the cluster.
    
    When a node becomes active again, this will not lead to any failback of ip addresses.
    
    This can reduce the number of "ip address movements" in the cluster since we dont automatically fail an ip address back, but can also lead to an unbalanced cluster since we no longer attempt to spread the ip addresses out evenly across the active nodes.
    
    This tuneable can NOT be active at the same time as DeterministicIPs are used.

commit fdaf7cb2d7682507fbf4c6c2b833b327c93fac08
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Mon Mar 3 10:53:23 2008 +1100

    when we reallocate the ip addresses for nodes, we must make sure that
    a node that has been allocated to server an ip actually CAN serve that ip
    (if we use differing public_addresses files on each node)

commit 06d3ce470766ef0b60d68ccd84de5437146cc147
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Mon Mar 3 10:24:17 2008 +1100

    add a num_connected field to the rec structure that holds the number
    of connected nodes
    
    num_active only contains the number of active nodes and would thus not count
    banned nodes

commit 07af425f444531942cce8abff112c1524228d287
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Mon Mar 3 09:19:30 2008 +1100

    add a new tunable : reclockpingperiod
    
    once every such interval :
    * the recovery master on each node will uppdate the "connected" count in the
    reclock count file (ctdb getreclock)
    * if the node thinks it is a recovery master but it detects another node
      that is DISCONNECTED but which still holds a lock to the reclock count file
      this may mean that we have a split cluster.
      if that other node that is DISCONNECTED but still holds the lock on hte reclock
      pnn count file, is MORE connected than the local node,
      yield the recmaster role and let the other half of the lcuster take over
    
    this add a second, last chance mechanism to detect split clusters.
    IF the cluster is split but GPFS is not yet split, this mechanism makes
    the largest half of the cluster become the active half.

commit b7f955338f50c92374b4f559268fb3a1a516aefa
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Mon Mar 3 07:53:46 2008 +1100

    change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure

commit bf1863cc9e2539b2c3e53c664b493b459ebfcc8b
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Fri Feb 29 13:14:47 2008 +1100

    update the reclock pnn count for how many nodes are connected to the current node once every 60 seconds

commit 8bb229aa3b4bd41e48d4e4e2e148d8680c8ba436
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Fri Feb 29 12:55:20 2008 +1100

    store the num_active variable (number of connected/active nodes) inside the rec
    structure and avoid passing this as an extra parameter to do_recovery()

commit 21d3319eaf463e2a00637d440ee2d4d15f53bf09
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Fri Feb 29 12:37:42 2008 +1100

    add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate.

commit 9effb22cc1616d684352d7ebabb359e69adb0f52
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Fri Feb 29 10:03:39 2008 +1100

    add a control to get the name of the reclock file from the daemon

commit c20293360db67f9876b0c84e5e9e12a5868964cb
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Fri Feb 22 10:33:09 2008 +1100

    add a new tunable DisableWhenUnhealthy which when set will cause a node to automatically become DISABLED anytime monitoring fails and the node becomes UNHEALTHY.
    
    Use with caution.

commit 613881a06186dec90fb64a7190ddf4afd7437d67
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Fri Feb 22 10:01:15 2008 +1100

    document the --start-as-disabled argument

commit 8df75775966ead36e1073896fedeff674a6e0587
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Fri Feb 22 09:52:57 2008 +1100

    Add debug output to indicate why a node starts up in DISABLED state

commit b93d29f43f5306c244c887b54a77bca8a061daf2
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Fri Feb 22 09:42:52 2008 +1100

    Add a new parameter to /etc/sysconfig/ctdb
    CTDB_START_AS_DISABLED="yes"
    
    and command line argument
    --start-as-disabled
    
    When set, this makes the ctdb node to always start in DISABLED mode and will thus not host any public ip addresses.
    The administrator must manually "ctdb enable" the node after it has started when the administrator wants the node to start hosting public ip addresses.
    
    Using this option it is possible to start ctdb on a node without causing any reallocation of ip addresses when it is starting. The node will still merge with the cluster and there will still be a recovery phase but the ip address allocations will not change in the cluster.

commit 35627c7450a03f36a353c3dd7cce31ce3433a7ff
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Thu Feb 21 13:29:28 2008 +1100

    monitor the amount of free memory and if this treshold is crossed, monitoring will log an OOM memory in the ctdb log and shut down ctdb on the node.
    
    by default ctdb does not monitor for OOM.
    to enable this you need to uncomment the CTDB_MONITOR_FREE_MEMORY line in /etc/sysconfig/ctdb and specify the amount in MByte free that will trigger OOM and cause ctdb to shutdown the node

commit bb8229d7b479bd486b07fa6cd04100fec02bddee
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Thu Feb 21 08:37:29 2008 +1100

    update version to 1.0.29

commit 6272ad33b4af6ea9d6fd0ac877df3f75be45d665
Author: Ronnie Sahlberg <sahlberg at samba.org>
Date:   Thu Feb 21 08:25:01 2008 +1100

    make the ctdb reloadnodes reload the nodes file on all nodes and restart the transport

-----------------------------------------------------------------------

Summary of changes:
 client/ctdb_client.c     |  197 ++++++++++++++++++++++---
 config/ctdb.init         |    3 +
 config/ctdb.sysconfig    |   14 ++
 config/events.d/00.ctdb  |   12 ++-
 config/events.d/70.iscsi |   53 ++++---
 doc/ctdb.1               |   30 ++++-
 doc/ctdb.1.html          |   73 +++++++---
 doc/ctdb.1.xml           |   53 +++++++
 doc/ctdbd.1              |   88 +++++++++++-
 doc/ctdbd.1.html         |   73 ++++++++-
 doc/ctdbd.1.xml          |  127 ++++++++++++++++
 include/ctdb.h           |   17 ++
 include/ctdb_private.h   |   40 ++++-
 packaging/RPM/ctdb.spec  |   31 ++++-
 server/ctdb_call.c       |   78 ++++++-----
 server/ctdb_control.c    |   28 +++-
 server/ctdb_monitor.c    |    5 +
 server/ctdb_recover.c    |  121 +++++++++++++--
 server/ctdb_recoverd.c   |  229 +++++++++++++++++++++++-----
 server/ctdb_server.c     |    7 +
 server/ctdb_takeover.c   |  127 +++++++++++++++-
 server/ctdb_tunables.c   |    3 +
 server/ctdbd.c           |    4 +
 tcp/tcp_connect.c        |    7 +-
 tools/ctdb.c             |  331 +++++++++++++++++++++++++++++++++++++++--
 tools/ctdb_vacuum.c      |  372 +++++++++++++++++++++++++++++-----------------
 web/iscsi.html           |   48 +++----
 27 files changed, 1814 insertions(+), 357 deletions(-)


Changeset truncated at 500 lines:

diff --git a/client/ctdb_client.c b/client/ctdb_client.c
index df328c0..f852e5f 100644
--- a/client/ctdb_client.c
+++ b/client/ctdb_client.c
@@ -124,7 +124,8 @@ int ctdb_call_local(struct ctdb_db_context *ctdb_db, struct ctdb_call *call,
 
 	if (c->reply_data) {
 		call->reply_data = *c->reply_data;
-		talloc_steal(ctdb, call->reply_data.dptr);
+
+		talloc_steal(call, call->reply_data.dptr);
 		talloc_set_name_const(call->reply_data.dptr, __location__);
 	} else {
 		call->reply_data.dptr = NULL;
@@ -170,9 +171,9 @@ static void ctdb_client_reply_call(struct ctdb_context *ctdb, struct ctdb_req_he
 		return;
 	}
 
-	state->call.reply_data.dptr = c->data;
-	state->call.reply_data.dsize = c->datalen;
-	state->call.status = c->status;
+	state->call->reply_data.dptr = c->data;
+	state->call->reply_data.dsize = c->datalen;
+	state->call->status = c->status;
 
 	talloc_steal(state, c);
 
@@ -308,16 +309,16 @@ int ctdb_call_recv(struct ctdb_client_call_state *state, struct ctdb_call *call)
 		return -1;
 	}
 
-	if (state->call.reply_data.dsize) {
+	if (state->call->reply_data.dsize) {
 		call->reply_data.dptr = talloc_memdup(state->ctdb_db,
-						      state->call.reply_data.dptr,
-						      state->call.reply_data.dsize);
-		call->reply_data.dsize = state->call.reply_data.dsize;
+						      state->call->reply_data.dptr,
+						      state->call->reply_data.dsize);
+		call->reply_data.dsize = state->call->reply_data.dsize;
 	} else {
 		call->reply_data.dptr = NULL;
 		call->reply_data.dsize = 0;
 	}
-	call->status = state->call.status;
+	call->status = state->call->status;
 	talloc_free(state);
 
 	return 0;
@@ -352,14 +353,16 @@ static struct ctdb_client_call_state *ctdb_client_call_local_send(struct ctdb_db
 
 	state = talloc_zero(ctdb_db, struct ctdb_client_call_state);
 	CTDB_NO_MEMORY_NULL(ctdb, state);
+	state->call = talloc_zero(state, struct ctdb_call);
+	CTDB_NO_MEMORY_NULL(ctdb, state->call);
 
 	talloc_steal(state, data->dptr);
 
-	state->state = CTDB_CALL_DONE;
-	state->call = *call;
+	state->state   = CTDB_CALL_DONE;
+	*(state->call) = *call;
 	state->ctdb_db = ctdb_db;
 
-	ret = ctdb_call_local(ctdb_db, &state->call, header, state, data, ctdb->pnn);
+	ret = ctdb_call_local(ctdb_db, state->call, header, state, data, ctdb->pnn);
 
 	return state;
 }
@@ -409,6 +412,11 @@ struct ctdb_client_call_state *ctdb_call_send(struct ctdb_db_context *ctdb_db,
 		DEBUG(DEBUG_ERR, (__location__ " failed to allocate state\n"));
 		return NULL;
 	}
+	state->call = talloc_zero(state, struct ctdb_call);
+	if (state->call == NULL) {
+		DEBUG(DEBUG_ERR, (__location__ " failed to allocate state->call\n"));
+		return NULL;
+	}
 
 	len = offsetof(struct ctdb_req_call, data) + call->key.dsize + call->call_data.dsize;
 	c = ctdbd_allocate_pkt(ctdb, state, CTDB_REQ_CALL, len, struct ctdb_req_call);
@@ -431,9 +439,9 @@ struct ctdb_client_call_state *ctdb_call_send(struct ctdb_db_context *ctdb_db,
 	memcpy(&c->data[0], call->key.dptr, call->key.dsize);
 	memcpy(&c->data[call->key.dsize], 
 	       call->call_data.dptr, call->call_data.dsize);
-	state->call                = *call;
-	state->call.call_data.dptr = &c->data[call->key.dsize];
-	state->call.key.dptr       = &c->data[0];
+	*(state->call)              = *call;
+	state->call->call_data.dptr = &c->data[call->key.dsize];
+	state->call->key.dptr       = &c->data[0];
 
 	state->state  = CTDB_CALL_WAIT;
 
@@ -1215,6 +1223,28 @@ int ctdb_ctrl_getdbmap(struct ctdb_context *ctdb, struct timeval timeout, uint32
 	return 0;
 }
 
+/*
+  get the reclock filename
+ */
+int ctdb_ctrl_getreclock(struct ctdb_context *ctdb, struct timeval timeout, uint32_t destnode, 
+		       TALLOC_CTX *mem_ctx, const char **reclock)
+{
+	int ret;
+	TDB_DATA outdata;
+	int32_t res;
+
+	ret = ctdb_control(ctdb, destnode, 0, 
+			   CTDB_CONTROL_GET_RECLOCK_FILE, 0, tdb_null, 
+			   mem_ctx, &outdata, &res, &timeout, NULL);
+	if (ret != 0 || res != 0) {
+		DEBUG(DEBUG_ERR,(__location__ " ctdb_control for getreclock failed\n"));
+		return -1;
+	}
+
+	*reclock = (const char *)talloc_steal(mem_ctx, outdata.dptr);
+
+	return 0;
+}
 
 /*
   get a list of nodes (vnn and flags ) from a remote node
@@ -1949,7 +1979,7 @@ int ctdb_ctrl_getmonmode(struct ctdb_context *ctdb, struct timeval timeout, uint
 			   CTDB_CONTROL_GET_MONMODE, 0, tdb_null, 
 			   NULL, NULL, &res, &timeout, NULL);
 	if (ret != 0) {
-		DEBUG(DEBUG_ERR,(__location__ " ctdb_control for getrecmode failed\n"));
+		DEBUG(DEBUG_ERR,(__location__ " ctdb_control for getmonmode failed\n"));
 		return -1;
 	}
 
@@ -1958,6 +1988,51 @@ int ctdb_ctrl_getmonmode(struct ctdb_context *ctdb, struct timeval timeout, uint
 	return 0;
 }
 
+
+/*
+ set the monitoring mode of a remote node to active
+ */
+int ctdb_ctrl_enable_monmode(struct ctdb_context *ctdb, struct timeval timeout, uint32_t destnode)
+{
+	int ret;
+	
+
+	ret = ctdb_control(ctdb, destnode, 0, 
+			   CTDB_CONTROL_ENABLE_MONITOR, 0, tdb_null, 
+			   NULL, NULL,NULL, &timeout, NULL);
+	if (ret != 0) {
+		DEBUG(DEBUG_ERR,(__location__ " ctdb_control for enable_monitor failed\n"));
+		return -1;
+	}
+
+	
+
+	return 0;
+}
+
+/*
+  set the monitoring mode of a remote node to disable
+ */
+int ctdb_ctrl_disable_monmode(struct ctdb_context *ctdb, struct timeval timeout, uint32_t destnode)
+{
+	int ret;
+	
+
+	ret = ctdb_control(ctdb, destnode, 0, 
+			   CTDB_CONTROL_DISABLE_MONITOR, 0, tdb_null, 
+			   NULL, NULL, NULL, &timeout, NULL);
+	if (ret != 0) {
+		DEBUG(DEBUG_ERR,(__location__ " ctdb_control for disable_monitor failed\n"));
+		return -1;
+	}
+
+	
+
+	return 0;
+}
+
+
+
 /* 
   sent to a node to make it take over an ip address
 */
@@ -2215,6 +2290,55 @@ int ctdb_ctrl_get_all_tunables(struct ctdb_context *ctdb,
 	return 0;
 }
 
+/*
+  add a public address to a node
+ */
+int ctdb_ctrl_add_public_ip(struct ctdb_context *ctdb, 
+		      struct timeval timeout, 
+		      uint32_t destnode,
+		      struct ctdb_control_ip_iface *pub)
+{
+	TDB_DATA data;
+	int32_t res;
+	int ret;
+
+	data.dsize = offsetof(struct ctdb_control_ip_iface, iface) + pub->len;
+	data.dptr  = (unsigned char *)pub;
+
+	ret = ctdb_control(ctdb, destnode, 0, CTDB_CONTROL_ADD_PUBLIC_IP, 0, data, NULL,
+			   NULL, &res, &timeout, NULL);
+	if (ret != 0 || res != 0) {
+		DEBUG(DEBUG_ERR,(__location__ " ctdb_control for add_public_ip failed\n"));
+		return -1;
+	}
+
+	return 0;
+}
+
+/*
+  delete a public address from a node
+ */
+int ctdb_ctrl_del_public_ip(struct ctdb_context *ctdb, 
+		      struct timeval timeout, 
+		      uint32_t destnode,
+		      struct ctdb_control_ip_iface *pub)
+{
+	TDB_DATA data;
+	int32_t res;
+	int ret;
+
+	data.dsize = offsetof(struct ctdb_control_ip_iface, iface) + pub->len;
+	data.dptr  = (unsigned char *)pub;
+
+	ret = ctdb_control(ctdb, destnode, 0, CTDB_CONTROL_DEL_PUBLIC_IP, 0, data, NULL,
+			   NULL, &res, &timeout, NULL);
+	if (ret != 0 || res != 0) {
+		DEBUG(DEBUG_ERR,(__location__ " ctdb_control for del_public_ip failed\n"));
+		return -1;
+	}
+
+	return 0;
+}
 
 /*
   kill a tcp connection
@@ -2253,13 +2377,13 @@ int ctdb_ctrl_gratious_arp(struct ctdb_context *ctdb,
 	TDB_DATA data;
 	int32_t res;
 	int ret, len;
-	struct ctdb_control_gratious_arp *gratious_arp;
+	struct ctdb_control_ip_iface *gratious_arp;
 	TALLOC_CTX *tmp_ctx = talloc_new(ctdb);
 
 
 	len = strlen(ifname)+1;
 	gratious_arp = talloc_size(tmp_ctx, 
-		offsetof(struct ctdb_control_gratious_arp, iface) + len);
+		offsetof(struct ctdb_control_ip_iface, iface) + len);
 	CTDB_NO_MEMORY(ctdb, gratious_arp);
 
 	gratious_arp->sin = *sin;
@@ -2267,7 +2391,7 @@ int ctdb_ctrl_gratious_arp(struct ctdb_context *ctdb,
 	memcpy(&gratious_arp->iface[0], ifname, len);
 
 
-	data.dsize = offsetof(struct ctdb_control_gratious_arp, iface) + len;
+	data.dsize = offsetof(struct ctdb_control_ip_iface, iface) + len;
 	data.dptr  = (unsigned char *)gratious_arp;
 
 	ret = ctdb_control(ctdb, destnode, 0, CTDB_CONTROL_SEND_GRATIOUS_ARP, 0, data, NULL,
@@ -2698,3 +2822,38 @@ uint32_t *list_of_active_nodes(struct ctdb_context *ctdb,
 
 	return nodes;
 }
+
+/* 
+  this is used to test if a pnn lock exists and if it exists will return
+  the number of connections that pnn has reported or -1 if that recovery
+  daemon is not running.
+*/
+int
+ctdb_read_pnn_lock(int fd, int32_t pnn)
+{
+	struct flock lock;
+	char c;
+
+	lock.l_type = F_WRLCK;
+	lock.l_whence = SEEK_SET;
+	lock.l_start = pnn;
+	lock.l_len = 1;
+	lock.l_pid = 0;
+
+	if (fcntl(fd, F_GETLK, &lock) != 0) {
+		DEBUG(DEBUG_ERR, (__location__ " F_GETLK failed with %s\n", strerror(errno)));
+		return -1;
+	}
+
+	if (lock.l_type == F_UNLCK) {
+		return -1;
+	}
+
+	if (pread(fd, &c, 1, pnn) == -1) {
+		DEBUG(DEBUG_CRIT,(__location__ " failed read pnn count - %s\n", strerror(errno)));
+		return -1;
+	}
+
+	return c;
+}
+
diff --git a/config/ctdb.init b/config/ctdb.init
index b01d8ee..bae52c2 100755
--- a/config/ctdb.init
+++ b/config/ctdb.init
@@ -63,6 +63,9 @@ CTDB_OPTIONS="$CTDB_OPTIONS --reclock=$CTDB_RECOVERY_LOCK"
 [ -z "$CTDB_EVENT_SCRIPT_DIR" ] || CTDB_OPTIONS="$CTDB_OPTIONS --event-script-dir $CTDB_EVENT_SCRIPT_DIR"
 [ -z "$CTDB_TRANSPORT" ]        || CTDB_OPTIONS="$CTDB_OPTIONS --transport $CTDB_TRANSPORT"
 [ -z "$CTDB_DEBUGLEVEL" ]       || CTDB_OPTIONS="$CTDB_OPTIONS -d $CTDB_DEBUGLEVEL"
+[ -z "$CTDB_START_AS_DISABLED" ] || [ "$CTDB_START_AS_DISABLED" != "yes" ] || {
+	CTDB_OPTIONS="$CTDB_OPTIONS --start-as-disabled"
+}
 
 if [ -x /sbin/startproc ]; then
     init_style="suse"
diff --git a/config/ctdb.sysconfig b/config/ctdb.sysconfig
index 9306884..9d1e434 100644
--- a/config/ctdb.sysconfig
+++ b/config/ctdb.sysconfig
@@ -77,6 +77,20 @@
 # defaults to tcp
 # CTDB_TRANSPORT="tcp"
 
+# When set, this variable makes ctdb monitor the amount of free memory
+# in the system (the second number in the buffers/cache output from free -m).
+# If the amount of free memory drops below this treshold the node will become
+# unhealthy and ctdb and all managed services will be shutdown.
+# Once this occurs, the administrator needs to find the reason for the OOM
+# situation, rectify it and restart ctdb with "service ctdb start"
+# The unit is MByte
+# CTDB_MONITOR_FREE_MEMORY=100
+
+# When set to yes, the CTDB node will start in DISABLED mode and not host
+# any public ip addresses. The administrator needs to explicitely enable
+# the node with "ctdb enable"
+# CTDB_START_AS_DISABLED="yes"
+
 # where to log messages
 # the default is /var/log/log.ctdb
 # CTDB_LOGFILE=/var/log/log.ctdb
diff --git a/config/events.d/00.ctdb b/config/events.d/00.ctdb
index 283ff44..a248b4e 100755
--- a/config/events.d/00.ctdb
+++ b/config/events.d/00.ctdb
@@ -57,7 +57,17 @@ case $cmd in
 		touch $CTDB_BASE/state/periodic_vacuum
 	    	periodic_vacuum
 	}
-	
+
+	# monitor that we are not running out of memory
+	[ -z "$CTDB_MONITOR_FREE_MEMORY" ] || {
+		FREE_MEM=`free -m | grep "buffers/cache" | while read A B C D ;do /bin/echo -n $D ; done`
+		[ `expr "$FREE_MEM" "<" "$CTDB_MONITOR_FREE_MEMORY"` != "0" ] && {
+			echo "OOM. Free:$FREE_MEM while CTDB treshold is $CTDB_MONITOR_FREE_MEMORY"
+			ctdb disable
+			sleep 3
+			ctdb shutdown
+		}
+	}
 esac
 
 # all OK
diff --git a/config/events.d/70.iscsi b/config/events.d/70.iscsi
index 734e398..0c05bdc 100755
--- a/config/events.d/70.iscsi
+++ b/config/events.d/70.iscsi
@@ -11,40 +11,49 @@ shift
 
 [ "$CTDB_MANAGES_ISCSI" = "yes" ] || exit 0
 
-[ -z "$CTDB_ISCSI_PUBLIC_IP" ] && {
-	echo "No public ip set for iscsi. iscsi disabled"
-	exit 1
-}
-
-[ -z "$CTDB_START_ISCSI_SCRIPT" ] && {
-	echo "No iscsi start script found"
-	exit 1
-}
-
-[ ! -x "$CTDB_START_ISCSI_SCRIPT" ] && {
-	echo "iscsi start script is not executable"
+[ -z "$CTDB_START_ISCSI_SCRIPTS" ] && {
+	echo "No iscsi start script directory found"
 	exit 1
 }
 
 case $cmd in 
      startup)
-	/bin/mkdir -p $CTDB_BASE/state/iscsi
 	;;
 
      takeip)
-	# when we takeover this ip we must start iscsi
-	[ $2 == "$CTDB_ISCSI_PUBLIC_IP" ] && {
-		$CTDB_START_ISCSI_SCRIPT
-		touch $CTDB_BASE/state/iscsi/iscsi_active
-	}
 	;;
 
      releaseip)
-	# when we release this ip we must stop iscsi
-	[ $2 == "$CTDB_ISCSI_PUBLIC_IP" ] && {
-		killall -9 tgtd >/dev/null 2>/dev/null
-		rm -rf $CTDB_BASE/state/iscsi/iscsi_active >/dev/null 2>/dev/null
+	;;
+
+     recovered)
+	# block the iscsi port
+	iptables -I INPUT 1 -p tcp --dport 3260 -j DROP
+	
+	# shut down the iscsi service
+	killall -9 tgtd >/dev/null 2>/dev/null
+
+	THIS_NODE=`ctdb status | grep "THIS NODE" | sed -e "s/pnn://" -e "s/ .*//"`
+	[ -z $THIS_NODE ] && {
+		echo "70.iscsi: Failed to get node pnn"
+		exit 0
 	}
+
+	# start the iscsi daemon
+	tgtd >/dev/null 2>/dev/null
+
+	for NODE in `ctdb ip | grep -v "Public" | egrep " ${THIS_NODE}$" | sed -e "s/ .*//"`; do
+		[ -f $CTDB_START_ISCSI_SCRIPTS/${NODE}.sh ] && {
+			echo Starting iscsi service for public address $NODE
+			$CTDB_START_ISCSI_SCRIPTS/${NODE}.sh
+		}
+	done
+
+	# remove all iptables rules
+	while `iptables -D INPUT -p tcp --dport 3260 -j DROP 2>/dev/null >/dev/null` ;  do
+		true;
+	done
+
 	;;
 
      shutdown)
diff --git a/doc/ctdb.1 b/doc/ctdb.1
index 498636b..9dcf495 100644
--- a/doc/ctdb.1
+++ b/doc/ctdb.1
@@ -1,11 +1,11 @@
 .\"     Title: ctdb
 .\"    Author: 
 .\" Generator: DocBook XSL Stylesheets v1.71.0 <http://docbook.sf.net/>
-.\"      Date: 02/05/2008
+.\"      Date: 03/04/2008
 .\"    Manual: 
 .\"    Source: 
 .\"
-.TH "CTDB" "1" "02/05/2008" "" ""
+.TH "CTDB" "1" "03/04/2008" "" ""
 .\" disable hyphenation
 .nh
 .\" disable justification (adjust text to left margin only)
@@ -333,6 +333,15 @@ Administratively ban a node for bantime seconds. A bantime of 0 means that the n
 A banned node does not participate in the cluster and does not host any records for the clustered TDB. Its ip address has been taken over by an other node and no services are hosted.
 .PP
 Nodes are automatically banned if they are the cause of too many cluster recoveries.
+.SS "moveip <public_ip> <node>"
+.PP
+This command can be used to manually fail a public ip address to a specific node.
+.PP
+In order to manually override the "automatic" distribution of public ip addresses that ctdb normally provides, this command only works when you have changed the tunables for the daemon to:
+.PP
+DeterministicIPs = 0
+.PP
+NoIPFailback = 1
 .SS "unban"
 .PP
 This command is used to unban a node that has either been administratively banned using the ban command or has been automatically banned by the recovery daemon.
@@ -345,6 +354,23 @@ This command will trigger the recovery daemon to do a cluster recovery.
 .SS "killtcp <srcip:port> <dstip:port>"
 .PP
 This command will kill the specified TCP connection by issuing a TCP RST to the srcip:port endpoint.
+.SS "reloadnodes"
+.PP
+This command is used when adding new nodes to an existing cluster and to reduce the disruption of this operation. This command should never be used except when expanding an existing cluster. This can only be used to expand a cluster. To remove a node from the cluster you still need to shut down ctdb on all nodes, edit the nodes file and restart ctdb.
+.PP
+Procedure:
+.PP
+1, To expand an existing cluster, first ensure with 'ctdb status' that all nodes are up and running and that they are all healthy. Do not try to expand a cluster unless it is completely healthy!
+.PP
+2, On all nodes, edit /etc/ctdb/nodes and add the new node as the last entry to the file. The new node MUST be added to the end of this file!
+.PP
+3, Verify that all the nodes have identical /etc/ctdb/nodes files after you edited them and added the new node!
+.PP
+4, Run 'ctdb reloadnodes' to force all nodes to reaload the nodesfile.
+.PP
+5, Use 'ctdb status' on all nodes and verify that they now show the additional node.
+.PP
+6, Install and configure the new node and bring it online.
 .SS "tickle <srcip:port> <dstip:port>"
 .PP
 This command will will send a TCP tickle to the source host for the specified TCP connection. A TCP tickle is a TCP ACK packet with an invalid sequence and acknowledge number and will when received by the source host result in it sending an immediate correct ACK back to the other end.


-- 
CTDB repository