[SCM] CTDB repository - branch master updated - ctdb-1.13-221-g624f467
Amitay Isaacs
amitay at samba.org
Thu Jul 26 05:21:08 MDT 2012
The branch, master has been updated
via 624f4677e99ed1710a0ace76201150349b1a0335 (commit)
via 5d713d5e5be67f5914a661694c15d938bd67dea3 (commit)
via 630cfe6451ba23d959fa4907fbba42702337ed3b (commit)
via 34f58a0773618c4508a55ad75fc4602dad5a5f4c (commit)
via f6e421e8bf935cae790a6dc2b861eb9c7f8610b4 (commit)
via 07149edaecb3caa672163e5a3b89715557d5205a (commit)
via e20fdb974158061f4627d6f360c168d764690e6f (commit)
from b3e798f357606648f04d8a67ffee775b34fdede7 (commit)
http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=master
- Log -----------------------------------------------------------------
commit 624f4677e99ed1710a0ace76201150349b1a0335
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 24 11:23:09 2012 +1000
Eventscripts: Default route on NAT gateway should have a metric of 10
At the moment routes from 11.routing can fail to be added because they
conflict with the default route added by 11.natgw.
NAT gateway is meant to be a last resort, so routes from 11.routing
should override it.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5d713d5e5be67f5914a661694c15d938bd67dea3
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 20:10:11 2012 +1000
Eventscripts: Update/remove stale comments in 11.natgw
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 630cfe6451ba23d959fa4907fbba42702337ed3b
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 15:39:50 2012 +1000
Eventscripts: Retrieve and build NAT gateway details better in 11.natgw
* "ctdb natgw" is run twice when it doesn't need to be.
* Tweak the parsing of "ctdb natgw" output so that it is done by the
shell instead of a bunch of external processes.
* Make default NAT gateway be -1, even on error. If the process
failed entirely then it could previously be empty.
* Streamline the error handling using die() for when there is no NAT
gateway.
* Downcase script-local variable names.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 34f58a0773618c4508a55ad75fc4602dad5a5f4c
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 15:37:14 2012 +1000
Eventscripts: Optimise building the host address in 11.natgw
It can be build without forking unnecessary processes.
Also downcase variable name because it is local to script.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f6e421e8bf935cae790a6dc2b861eb9c7f8610b4
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 15:32:38 2012 +1000
Eventscripts: Clean up startup sanity check in 11.natgw
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 07149edaecb3caa672163e5a3b89715557d5205a
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 15:26:16 2012 +1000
Eventscripts: remove redundant firewall rules from 11.natgw
aeb70c7e7822854eb87873a5c7783e27e6e72318 said it moved these but it
redundantly duplicated them instead. That commit also fixed the
problem because it moved the rules after delete_all() not out of the
startup event as claimed.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e20fdb974158061f4627d6f360c168d764690e6f
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 15:21:10 2012 +1000
Eventscripts: 11.natgw $CTDB_NATGW_PUBLIC_IP splitting optimisation
$CTDB_NATGW_PUBLIC_IP can be split into $_ip and $_maskbits without
forking lots of processes.
Also "local" isn't supported by POSIX.
Signed-off-by: Martin Schwenke <martin at meltin.net>
-----------------------------------------------------------------------
Summary of changes:
config/events.d/11.natgw | 54 +++++----------
doc/ctdbd.1 | 8 +-
doc/ctdbd.1.html | 168 +++++++++++++++++++++++----------------------
doc/ctdbd.1.xml | 12 ++-
4 files changed, 116 insertions(+), 126 deletions(-)
Changeset truncated at 500 lines:
diff --git a/config/events.d/11.natgw b/config/events.d/11.natgw
index 5093752..3eb3dad 100755
--- a/config/events.d/11.natgw
+++ b/config/events.d/11.natgw
@@ -18,8 +18,8 @@ else
fi
delete_all() {
- local _ip=`echo $CTDB_NATGW_PUBLIC_IP | cut -d '/' -f1`
- local _maskbits=`echo $CTDB_NATGW_PUBLIC_IP | cut -d '/' -f2`
+ _ip="${CTDB_NATGW_PUBLIC_IP%/*}"
+ _maskbits="${CTDB_NATGW_PUBLIC_IP#*/}"
[ -z "$CTDB_NATGW_PUBLIC_IFACE" ] || {
delete_ip_from_iface $CTDB_NATGW_PUBLIC_IFACE $_ip $_maskbits 2>/dev/null
@@ -36,58 +36,40 @@ delete_all() {
case "$1" in
startup)
- [ -z "$CTDB_PUBLIC_ADDRESSES" ] && {
- CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
- }
- egrep "^$CTDB_NATGW_PUBLIC_IP[ \t]" $CTDB_PUBLIC_ADDRESSES >/dev/null
- [ "$?" = "0" ] && {
- echo ERROR: NATGW configured to use a public address. NATGW must not use a public address.
- exit 1
- }
+ # Error if CTDB_NATGW_PUBLIC_IP is listed in public addresses
+ grep -q "^$CTDB_NATGW_PUBLIC_IP[[:space:]]" "${CTDB_PUBLIC_ADDRESSES:-/etc/ctdb/public_addresses}" && \
+ die "ERROR: NATGW configured to use a public address. NATGW must not use a public address."
# do not send out arp requests from loopback addresses
echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
;;
recovered|updatenatgw|ipreallocated)
- MYPNN=`ctdb pnn | cut -d: -f2`
- NATGWMASTER=`ctdb natgwlist | head -1 | sed -e "s/ .*//"`
- NATGWIP=`ctdb natgwlist | head -1 | sed -e "s/^[^ ]* *//"`
-
- CTDB_NATGW_PUBLIC_IP_HOST=`echo $CTDB_NATGW_PUBLIC_IP | sed -e "s/\/.*/\/32/"`
+ mypnn=$(ctdb pnn | cut -d: -f2)
- # block all incoming connections to the natgw address
- iptables -D INPUT -p tcp --syn -d $CTDB_NATGW_PUBLIC_IP_HOST -j REJECT 2>/dev/null
- iptables -I INPUT -p tcp --syn -d $CTDB_NATGW_PUBLIC_IP_HOST -j REJECT 2>/dev/null
+ set -- $(ctdb natgwlist)
+ natgwmaster="${1:--1}" # Default is -1 if natgwlist fails
+ natgwip="$2"
-
- if [ "$NATGWMASTER" = "-1" ]; then
- echo "There is no NATGW master node"
- exit 1
- fi
+ [ "$natgwmaster" = "-1" ] && die "There is no NATGW master node"
delete_all
- if [ "$MYPNN" = "$NATGWMASTER" ]; then
- # This is the first node, set it up as the NAT GW
+ if [ "$mypnn" = "$natgwmaster" ]; then
+ # This is the NAT GW
echo 1 >/proc/sys/net/ipv4/ip_forward
iptables -A POSTROUTING -t nat -s $CTDB_NATGW_PRIVATE_NETWORK ! -d $CTDB_NATGW_PRIVATE_NETWORK -j MASQUERADE
# block all incoming connections to the natgw address
- CTDB_NATGW_PUBLIC_IP_HOST=`echo $CTDB_NATGW_PUBLIC_IP | sed -e "s/\/.*/\/32/"`
- iptables -D INPUT -p tcp --syn -d $CTDB_NATGW_PUBLIC_IP_HOST -j REJECT 2>/dev/null
- iptables -I INPUT -p tcp --syn -d $CTDB_NATGW_PUBLIC_IP_HOST -j REJECT 2>/dev/null
+ ctdb_natgw_public_ip_host="${CTDB_NATGW_PUBLIC_IP%/*}/32"
+ iptables -D INPUT -p tcp --syn -d $ctdb_natgw_public_ip_host -j REJECT 2>/dev/null
+ iptables -I INPUT -p tcp --syn -d $ctdb_natgw_public_ip_host -j REJECT 2>/dev/null
ip addr add $CTDB_NATGW_PUBLIC_IP dev $CTDB_NATGW_PUBLIC_IFACE
- ip route add 0.0.0.0/0 via $CTDB_NATGW_DEFAULT_GATEWAY >/dev/null 2>/dev/null
+ ip route add 0.0.0.0/0 metric 10 via $CTDB_NATGW_DEFAULT_GATEWAY >/dev/null 2>/dev/null
else
- # This is not the NAT-GW
- # Assign the public ip to the private interface and make
- # sure we dont respond to ARPs.
- # We do this so that the ip address will exist on a
- # non-loopback interface so that samba may send it along in the
- # KDC requests.
- ip route add 0.0.0.0/0 via $NATGWIP metric 10
+ # This is NOT the NAT GW
+ ip route add 0.0.0.0/0 via $natgwip metric 10
# Make sure winbindd does not stay bound to this address
# if we are no longer natgwmaster
smbcontrol winbindd ip-dropped $CTDB_NATGW_PUBLIC_IP >/dev/null 2>/dev/null
diff --git a/doc/ctdbd.1 b/doc/ctdbd.1
index 60abf03..e4ea114 100644
--- a/doc/ctdbd.1
+++ b/doc/ctdbd.1
@@ -1,13 +1,13 @@
'\" t
.\" Title: ctdbd
.\" Author: [FIXME: author] [see http://docbook.sf.net/el/author]
-.\" Generator: DocBook XSL Stylesheets v1.75.2 <http://docbook.sf.net/>
-.\" Date: 05/21/2012
+.\" Generator: DocBook XSL Stylesheets v1.76.1 <http://docbook.sf.net/>
+.\" Date: 07/26/2012
.\" Manual: CTDB - clustered TDB database
.\" Source: ctdb
.\" Language: English
.\"
-.TH "CTDBD" "1" "05/21/2012" "ctdb" "CTDB \- clustered TDB database"
+.TH "CTDBD" "1" "07/26/2012" "ctdb" "CTDB \- clustered TDB database"
.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
@@ -763,7 +763,7 @@ This is the list of all nodes that belong to the same NATGW group as this node\&
.PP
When the NAT\-GW functionality is used, one of the nodes is elected to act as a NAT router for all the other nodes in the group when they need to originate traffic to the external public network\&.
.PP
-The NAT\-GW node is assigned the CTDB_NATGW_PUBLIC_IP to the designated interface and the provided default route\&. The NAT\-GW is configured to act as a router and to masquerade all traffic it receives from the internal private network and which is destined to the external network(s)\&.
+The NAT\-GW node is assigned the CTDB_NATGW_PUBLIC_IP to the specified interface and the provided default route\&. Given that the NAT\-GW mechanism acts as a last resort, its default route is added with a metric of 10 so that it can coexist with other configured static routes\&. The NAT\-GW is configured to act as a router and to masquerade all traffic it receives from the internal private network and which is destined to the external network(s)\&.
.PP
All other nodes in the group are configured with a default route of metric 10 pointing to the designated NAT GW node\&.
.PP
diff --git a/doc/ctdbd.1.html b/doc/ctdbd.1.html
index 0530b22..a2e6bc8 100644
--- a/doc/ctdbd.1.html
+++ b/doc/ctdbd.1.html
@@ -1,4 +1,4 @@
-<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>ctdbd</title><meta name="generator" content="DocBook XSL Stylesheets V1.75.2"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="refentry" title="ctdbd"><a name="ctdbd.1"></a><div class="titlepage"></div><div class="refnamediv"><h2>Name</h2><p>ctdbd — The CTDB cluster daemon</p></div><div class="refsynopsisdiv" title="Synopsis"><h2>Synopsis</h2><div class="cmdsynopsis"><p><code class="command">ctdbd</code> </p></div><div class="cmdsynopsis"><p><code class="command">ctdbd</code> [-? --help] [-d --debug=<INTEGER>] {--dbdir=<directory>} {--dbdir-persistent=<directory>} [--event-script-dir=<directory>] [-i --interactive] [--listen=<address>] [--logfile=<filename>] [--lvs] {--nlist=<filename>} [--no-lmaster] [--no-recmaster] [--nosetsched] {--notification-script=<filename>} [--public-add
resses=<filename>] [--public-interface=<interface>] {--reclock=<filename>} [--single-public-ip=<address>] [--socket=<filename>] [--start-as-disabled] [--start-as-stopped] [--syslog] [--log-ringbuf-size=<num-entries>] [--torture] [--transport=<STRING>] [--usage]</p></div></div><div class="refsect1" title="DESCRIPTION"><a name="id530187"></a><h2>DESCRIPTION</h2><p>
+<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>ctdbd</title><meta name="generator" content="DocBook XSL Stylesheets V1.76.1"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="refentry" title="ctdbd"><a name="ctdbd.1"></a><div class="titlepage"></div><div class="refnamediv"><h2>Name</h2><p>ctdbd — The CTDB cluster daemon</p></div><div class="refsynopsisdiv" title="Synopsis"><h2>Synopsis</h2><div class="cmdsynopsis"><p><code class="command">ctdbd</code> </p></div><div class="cmdsynopsis"><p><code class="command">ctdbd</code> [-? --help] [-d --debug=<INTEGER>] {--dbdir=<directory>} {--dbdir-persistent=<directory>} [--event-script-dir=<directory>] [-i --interactive] [--listen=<address>] [--logfile=<filename>] [--lvs] {--nlist=<filename>} [--no-lmaster] [--no-recmaster] [--nosetsched] {--notification-script=<filename>} [--public-add
resses=<filename>] [--public-interface=<interface>] {--reclock=<filename>} [--single-public-ip=<address>] [--socket=<filename>] [--start-as-disabled] [--start-as-stopped] [--syslog] [--log-ringbuf-size=<num-entries>] [--torture] [--transport=<STRING>] [--usage]</p></div></div><div class="refsect1" title="DESCRIPTION"><a name="idp199104"></a><h2>DESCRIPTION</h2><p>
ctdbd is the main ctdb daemon.
</p><p>
ctdbd provides a clustered version of the TDB database with automatic rebuild/recovery of the databases upon nodefailures.
@@ -8,7 +8,7 @@
ctdbd provides monitoring of all nodes in the cluster and automatically reconfigures the cluster and recovers upon node failures.
</p><p>
ctdbd is the main component in clustered Samba that provides a high-availability load-sharing CIFS server cluster.
- </p></div><div class="refsect1" title="OPTIONS"><a name="id530215"></a><h2>OPTIONS</h2><div class="variablelist"><dl><dt><span class="term">-? --help</span></dt><dd><p>
+ </p></div><div class="refsect1" title="OPTIONS"><a name="idp201064"></a><h2>OPTIONS</h2><div class="variablelist"><dl><dt><span class="term">-? --help</span></dt><dd><p>
Print some help text to the screen.
</p></dd><dt><span class="term">-d --debug=<DEBUGLEVEL></span></dt><dd><p>
This option sets the debuglevel on the ctdbd daemon which controls what will be written to the logfile. The default is 0 which will only log important events and errors. A larger number will provide additional logging.
@@ -154,10 +154,10 @@
implemented in the future.
</p></dd><dt><span class="term">--usage</span></dt><dd><p>
Print useage information to the screen.
- </p></dd></dl></div></div><div class="refsect1" title="Private vs Public addresses"><a name="id487322"></a><h2>Private vs Public addresses</h2><p>
+ </p></dd></dl></div></div><div class="refsect1" title="Private vs Public addresses"><a name="idp90512"></a><h2>Private vs Public addresses</h2><p>
When used for ip takeover in a HA environment, each node in a ctdb
cluster has multiple ip addresses assigned to it. One private and one or more public.
- </p><div class="refsect2" title="Private address"><a name="id487332"></a><h3>Private address</h3><p>
+ </p><div class="refsect2" title="Private address"><a name="idp91136"></a><h3>Private address</h3><p>
This is the physical ip address of the node which is configured in
linux and attached to a physical interface. This address uniquely
identifies a physical node in the cluster and is the ip addresses
@@ -187,7 +187,7 @@
10.1.1.2
10.1.1.3
10.1.1.4
- </pre></div><div class="refsect2" title="Public address"><a name="id487367"></a><h3>Public address</h3><p>
+ </pre></div><div class="refsect2" title="Public address"><a name="idp94040"></a><h3>Public address</h3><p>
A public address on the other hand is not attached to an interface.
This address is managed by ctdbd itself and is attached/detached to
a physical node at runtime.
@@ -248,7 +248,7 @@
unavailable. 10.1.1.1 can not be failed over to node 2 or node 3 since
these nodes do not have this ip address listed in their public
addresses file.
- </p></div></div><div class="refsect1" title="Node status"><a name="id487428"></a><h2>Node status</h2><p>
+ </p></div></div><div class="refsect1" title="Node status"><a name="idp98936"></a><h2>Node status</h2><p>
The current status of each node in the cluster can be viewed by the
'ctdb status' command.
</p><p>
@@ -285,9 +285,9 @@
RECMASTER or NATGW.
This node does not perticipate in the CTDB cluster but can still be
communicated with. I.e. ctdb commands can be sent to it.
- </p></div><div class="refsect1" title="PUBLIC TUNABLES"><a name="id487477"></a><h2>PUBLIC TUNABLES</h2><p>
+ </p></div><div class="refsect1" title="PUBLIC TUNABLES"><a name="idp102960"></a><h2>PUBLIC TUNABLES</h2><p>
These are the public tuneables that can be used to control how ctdb behaves.
- </p><div class="refsect2" title="MaxRedirectCount"><a name="id487486"></a><h3>MaxRedirectCount</h3><p>Default: 3</p><p>
+ </p><div class="refsect2" title="MaxRedirectCount"><a name="idp103592"></a><h3>MaxRedirectCount</h3><p>Default: 3</p><p>
If we are not the DMASTER and need to fetch a record across the network
we first send the request to the LMASTER after which the record
is passed onto the current DMASTER. If the DMASTER changes before
@@ -301,7 +301,7 @@
</p><p>
When chasing a record, this is how many hops we will chase the record
for before going back to the LMASTER to ask for new guidance.
- </p></div><div class="refsect2" title="SeqnumInterval"><a name="id487508"></a><h3>SeqnumInterval</h3><p>Default: 1000</p><p>
+ </p></div><div class="refsect2" title="SeqnumInterval"><a name="idp105312"></a><h3>SeqnumInterval</h3><p>Default: 1000</p><p>
Some databases have seqnum tracking enabled, so that samba will be able
to detect asynchronously when there has been updates to the database.
Everytime a database is updated its sequence number is increased.
@@ -309,17 +309,17 @@
This tunable is used to specify in 'ms' how frequently ctdb will
send out updates to remote nodes to inform them that the sequence
number is increased.
- </p></div><div class="refsect2" title="ControlTimeout"><a name="id487527"></a><h3>ControlTimeout</h3><p>Default: 60</p><p>
+ </p></div><div class="refsect2" title="ControlTimeout"><a name="idp106664"></a><h3>ControlTimeout</h3><p>Default: 60</p><p>
This is the default
setting for timeout for when sending a control message to either the
local or a remote ctdb daemon.
- </p></div><div class="refsect2" title="TraverseTimeout"><a name="id487540"></a><h3>TraverseTimeout</h3><p>Default: 20</p><p>
+ </p></div><div class="refsect2" title="TraverseTimeout"><a name="idp107552"></a><h3>TraverseTimeout</h3><p>Default: 20</p><p>
This setting controls how long we allow a traverse process to run.
After this timeout triggers, the main ctdb daemon will abort the
traverse if it has not yet finished.
- </p></div><div class="refsect2" title="KeepaliveInterval"><a name="id487554"></a><h3>KeepaliveInterval</h3><p>Default: 5</p><p>
+ </p></div><div class="refsect2" title="KeepaliveInterval"><a name="idp108488"></a><h3>KeepaliveInterval</h3><p>Default: 5</p><p>
How often in seconds should the nodes send keepalives to eachother.
- </p></div><div class="refsect2" title="KeepaliveLimit"><a name="id487567"></a><h3>KeepaliveLimit</h3><p>Default: 5</p><p>
+ </p></div><div class="refsect2" title="KeepaliveLimit"><a name="idp109320"></a><h3>KeepaliveLimit</h3><p>Default: 5</p><p>
After how many keepalive intervals without any traffic should a node
wait until marking the peer as DISCONNECTED.
</p><p>
@@ -328,60 +328,60 @@
require a recovery. This limitshould not be set too high since we want
a hung node to be detectec, and expunged from the cluster well before
common CIFS timeouts (45-90 seconds) kick in.
- </p></div><div class="refsect2" title="RecoverTimeout"><a name="id487586"></a><h3>RecoverTimeout</h3><p>Default: 20</p><p>
+ </p></div><div class="refsect2" title="RecoverTimeout"><a name="idp110760"></a><h3>RecoverTimeout</h3><p>Default: 20</p><p>
This is the default setting for timeouts for controls when sent from the
recovery daemon. We allow longer control timeouts from the recovery daemon
than from normal use since the recovery dameon often use controls that
can take a lot longer than normal controls.
- </p></div><div class="refsect2" title="RecoverInterval"><a name="id487601"></a><h3>RecoverInterval</h3><p>Default: 1</p><p>
+ </p></div><div class="refsect2" title="RecoverInterval"><a name="idp111800"></a><h3>RecoverInterval</h3><p>Default: 1</p><p>
How frequently in seconds should the recovery daemon perform the
consistency checks that determine if we need to perform a recovery or not.
- </p></div><div class="refsect2" title="ElectionTimeout"><a name="id487615"></a><h3>ElectionTimeout</h3><p>Default: 3</p><p>
+ </p></div><div class="refsect2" title="ElectionTimeout"><a name="idp112704"></a><h3>ElectionTimeout</h3><p>Default: 3</p><p>
When electing a new recovery master, this is how many seconds we allow
the election to take before we either deem the election finished
or we fail the election and start a new one.
- </p></div><div class="refsect2" title="TakeoverTimeout"><a name="id487629"></a><h3>TakeoverTimeout</h3><p>Default: 9</p><p>
+ </p></div><div class="refsect2" title="TakeoverTimeout"><a name="idp113656"></a><h3>TakeoverTimeout</h3><p>Default: 9</p><p>
This is how many seconds we allow controls to take for IP failover events.
- </p></div><div class="refsect2" title="MonitorInterval"><a name="id487641"></a><h3>MonitorInterval</h3><p>Default: 15</p><p>
+ </p></div><div class="refsect2" title="MonitorInterval"><a name="idp114496"></a><h3>MonitorInterval</h3><p>Default: 15</p><p>
How often should ctdb run the event scripts to check for a nodes health.
- </p></div><div class="refsect2" title="TickleUpdateInterval"><a name="id487654"></a><h3>TickleUpdateInterval</h3><p>Default: 20</p><p>
+ </p></div><div class="refsect2" title="TickleUpdateInterval"><a name="idp115328"></a><h3>TickleUpdateInterval</h3><p>Default: 20</p><p>
How often will ctdb record and store the "tickle" information used to
kickstart stalled tcp connections after a recovery.
- </p></div><div class="refsect2" title="EventScriptTimeout"><a name="id487667"></a><h3>EventScriptTimeout</h3><p>Default: 20</p><p>
+ </p></div><div class="refsect2" title="EventScriptTimeout"><a name="idp116192"></a><h3>EventScriptTimeout</h3><p>Default: 20</p><p>
How long should ctdb let an event script run before aborting it and
marking the node unhealthy.
- </p></div><div class="refsect2" title="EventScriptTimeoutCount"><a name="id487680"></a><h3>EventScriptTimeoutCount</h3><p>Default: 1</p><p>
+ </p></div><div class="refsect2" title="EventScriptTimeoutCount"><a name="idp117056"></a><h3>EventScriptTimeoutCount</h3><p>Default: 1</p><p>
How many events in a row needs to timeout before we flag the node UNHEALTHY.
This setting is useful if your scripts can not be written so that they
do not hang for benign reasons.
- </p></div><div class="refsect2" title="EventScriptUnhealthyOnTimeout"><a name="id487694"></a><h3>EventScriptUnhealthyOnTimeout</h3><p>Default: 0</p><p>
+ </p></div><div class="refsect2" title="EventScriptUnhealthyOnTimeout"><a name="idp117984"></a><h3>EventScriptUnhealthyOnTimeout</h3><p>Default: 0</p><p>
This setting can be be used to make ctdb never become UNHEALTHY if your
eventscripts keep hanging/timing out.
- </p></div><div class="refsect2" title="RecoveryGracePeriod"><a name="id487708"></a><h3>RecoveryGracePeriod</h3><p>Default: 120</p><p>
+ </p></div><div class="refsect2" title="RecoveryGracePeriod"><a name="idp118832"></a><h3>RecoveryGracePeriod</h3><p>Default: 120</p><p>
During recoveries, if a node has not caused recovery failures during the
last grace period, any records of transgressions that the node has caused
recovery failures will be forgiven. This resets the ban-counter back to
zero for that node.
- </p></div><div class="refsect2" title="RecoveryBanPeriod"><a name="id487722"></a><h3>RecoveryBanPeriod</h3><p>Default: 300</p><p>
+ </p></div><div class="refsect2" title="RecoveryBanPeriod"><a name="idp119856"></a><h3>RecoveryBanPeriod</h3><p>Default: 300</p><p>
If a node becomes banned causing repetitive recovery failures. The node will
eventually become banned from the cluster.
This controls how long the culprit node will be banned from the cluster
before it is allowed to try to join the cluster again.
Don't set to small. A node gets banned for a reason and it is usually due
to real problems with the node.
- </p></div><div class="refsect2" title="DatabaseHashSize"><a name="id487741"></a><h3>DatabaseHashSize</h3><p>Default: 100001</p><p>
+ </p></div><div class="refsect2" title="DatabaseHashSize"><a name="idp121384"></a><h3>DatabaseHashSize</h3><p>Default: 100001</p><p>
Size of the hash chains for the local store of the tdbs that ctdb manages.
- </p></div><div class="refsect2" title="DatabaseMaxDead"><a name="id487754"></a><h3>DatabaseMaxDead</h3><p>Default: 5</p><p>
+ </p></div><div class="refsect2" title="DatabaseMaxDead"><a name="idp122232"></a><h3>DatabaseMaxDead</h3><p>Default: 5</p><p>
How many dead records per hashchain in the TDB database do we allow before
the freelist needs to be processed.
- </p></div><div class="refsect2" title="RerecoveryTimeout"><a name="id531336"></a><h3>RerecoveryTimeout</h3><p>Default: 10</p><p>
+ </p></div><div class="refsect2" title="RerecoveryTimeout"><a name="idp123112"></a><h3>RerecoveryTimeout</h3><p>Default: 10</p><p>
Once a recovery has completed, no additional recoveries are permitted
until this timeout has expired.
- </p></div><div class="refsect2" title="EnableBans"><a name="id531349"></a><h3>EnableBans</h3><p>Default: 1</p><p>
+ </p></div><div class="refsect2" title="EnableBans"><a name="idp123976"></a><h3>EnableBans</h3><p>Default: 1</p><p>
When set to 0, this disables BANNING completely in the cluster and thus
nodes can not get banned, even it they break. Don't set to 0 unless you
know what you are doing.
- </p></div><div class="refsect2" title="DeterministicIPs"><a name="id531362"></a><h3>DeterministicIPs</h3><p>Default: 0</p><p>
+ </p></div><div class="refsect2" title="DeterministicIPs"><a name="idp124904"></a><h3>DeterministicIPs</h3><p>Default: 0</p><p>
When enabled, this tunable makes ctdb try to keep public IP addresses
locked to specific nodes as far as possible. This makes it easier for
debugging since you can know that as long as all nodes are healthy
@@ -392,12 +392,12 @@
public IP assignment changes in the cluster. This tunable may increase
the number of IP failover/failbacks that are performed on the cluster
by a small margin.
- </p></div><div class="refsect2" title="LCP2PublicIPs"><a name="id531383"></a><h3>LCP2PublicIPs</h3><p>Default: 1</p><p>
+ </p></div><div class="refsect2" title="LCP2PublicIPs"><a name="idp126448"></a><h3>LCP2PublicIPs</h3><p>Default: 1</p><p>
When enabled this switches ctdb to use the LCP2 ip allocation
algorithm.
- </p></div><div class="refsect2" title="ReclockPingPeriod"><a name="id531394"></a><h3>ReclockPingPeriod</h3><p>Default: x</p><p>
+ </p></div><div class="refsect2" title="ReclockPingPeriod"><a name="idp127288"></a><h3>ReclockPingPeriod</h3><p>Default: x</p><p>
Obsolete
- </p></div><div class="refsect2" title="NoIPFailback"><a name="id531406"></a><h3>NoIPFailback</h3><p>Default: 0</p><p>
+ </p></div><div class="refsect2" title="NoIPFailback"><a name="idp128056"></a><h3>NoIPFailback</h3><p>Default: 0</p><p>
When set to 1, ctdb will not perform failback of IP addresses when a node
becomes healthy. Ctdb WILL perform failover of public IP addresses when a
node becomes UNHEALTHY, but when the node becomes HEALTHY again, ctdb
@@ -415,7 +415,7 @@
intervention from the administrator. When this parameter is set, you can
manually fail public IP addresses over to the new node(s) using the
'ctdb moveip' command.
- </p></div><div class="refsect2" title="DisableIPFailover"><a name="id531433"></a><h3>DisableIPFailover</h3><p>Default: 0</p><p>
+ </p></div><div class="refsect2" title="DisableIPFailover"><a name="idp130224"></a><h3>DisableIPFailover</h3><p>Default: 0</p><p>
When enabled, ctdb will not perform failover or failback. Even if a
node fails while holding public IPs, ctdb will not recover the IPs or
assign them to another node.
@@ -424,52 +424,52 @@
the cluster by failing IP addresses over to other nodes. This leads to
a service outage until the administrator has manually performed failover
to replacement nodes using the 'ctdb moveip' command.
- </p></div><div class="refsect2" title="NoIPTakeover"><a name="id531452"></a><h3>NoIPTakeover</h3><p>Default: 0</p><p>
+ </p></div><div class="refsect2" title="NoIPTakeover"><a name="idp131648"></a><h3>NoIPTakeover</h3><p>Default: 0</p><p>
When set to 1, ctdb will allow ip addresses to be failed over onto this
node. Any ip addresses that the node currently hosts will remain on the
node but no new ip addresses can be failed over onto the node.
- </p></div><div class="refsect2" title="DBRecordCountWarn"><a name="id531466"></a><h3>DBRecordCountWarn</h3><p>Default: 100000</p><p>
+ </p></div><div class="refsect2" title="DBRecordCountWarn"><a name="idp132624"></a><h3>DBRecordCountWarn</h3><p>Default: 100000</p><p>
When set to non-zero, ctdb will log a warning when we try to recover a
database with more than this many records. This will produce a warning
if a database grows uncontrollably with orphaned records.
- </p></div><div class="refsect2" title="DBRecordSizeWarn"><a name="id531480"></a><h3>DBRecordSizeWarn</h3><p>Default: 10000000</p><p>
+ </p></div><div class="refsect2" title="DBRecordSizeWarn"><a name="idp133600"></a><h3>DBRecordSizeWarn</h3><p>Default: 10000000</p><p>
When set to non-zero, ctdb will log a warning when we try to recover a
database where a single record is bigger than this. This will produce
a warning if a database record grows uncontrollably with orphaned
sub-records.
- </p></div><div class="refsect2" title="DBSizeWarn"><a name="id531494"></a><h3>DBSizeWarn</h3><p>Default: 1000000000</p><p>
+ </p></div><div class="refsect2" title="DBSizeWarn"><a name="idp134600"></a><h3>DBSizeWarn</h3><p>Default: 1000000000</p><p>
When set to non-zero, ctdb will log a warning when we try to recover a
database bigger than this. This will produce
a warning if a database grows uncontrollably.
- </p></div><div class="refsect2" title="VerboseMemoryNames"><a name="id531507"></a><h3>VerboseMemoryNames</h3><p>Default: 0</p><p>
+ </p></div><div class="refsect2" title="VerboseMemoryNames"><a name="idp135528"></a><h3>VerboseMemoryNames</h3><p>Default: 0</p><p>
This feature consumes additional memory. when used the talloc library
will create more verbose names for all talloc allocated objects.
- </p></div><div class="refsect2" title="RecdPingTimeout"><a name="id531520"></a><h3>RecdPingTimeout</h3><p>Default: 60</p><p>
+ </p></div><div class="refsect2" title="RecdPingTimeout"><a name="idp136432"></a><h3>RecdPingTimeout</h3><p>Default: 60</p><p>
If the main dameon has not heard a "ping" from the recovery dameon for
this many seconds, the main dameon will log a message that the recovery
daemon is potentially hung.
- </p></div><div class="refsect2" title="RecdFailCount"><a name="id531533"></a><h3>RecdFailCount</h3><p>Default: 10</p><p>
+ </p></div><div class="refsect2" title="RecdFailCount"><a name="idp137376"></a><h3>RecdFailCount</h3><p>Default: 10</p><p>
If the recovery daemon has failed to ping the main dameon for this many
consecutive intervals, the main daemon will consider the recovery daemon
as hung and will try to restart it to recover.
- </p></div><div class="refsect2" title="LogLatencyMs"><a name="id531547"></a><h3>LogLatencyMs</h3><p>Default: 0</p><p>
+ </p></div><div class="refsect2" title="LogLatencyMs"><a name="idp138336"></a><h3>LogLatencyMs</h3><p>Default: 0</p><p>
When set to non-zero, this will make the main daemon log any operation that
took longer than this value, in 'ms', to complete.
These include "how long time a lockwait child process needed",
"how long time to write to a persistent database" but also
"how long did it take to get a response to a CALL from a remote node".
- </p></div><div class="refsect2" title="RecLockLatencyMs"><a name="id531562"></a><h3>RecLockLatencyMs</h3><p>Default: 1000</p><p>
+ </p></div><div class="refsect2" title="RecLockLatencyMs"><a name="idp139432"></a><h3>RecLockLatencyMs</h3><p>Default: 1000</p><p>
When using a reclock file for split brain prevention, if set to non-zero
this tunable will make the recovery dameon log a message if the fcntl()
call to lock/testlock the recovery file takes longer than this number of
ms.
- </p></div><div class="refsect2" title="RecoveryDropAllIPs"><a name="id531576"></a><h3>RecoveryDropAllIPs</h3><p>Default: 120</p><p>
+ </p></div><div class="refsect2" title="RecoveryDropAllIPs"><a name="idp140440"></a><h3>RecoveryDropAllIPs</h3><p>Default: 120</p><p>
If we have been stuck in recovery, or stopped, or banned, mode for
this many seconds we will force drop all held public addresses.
- </p></div><div class="refsect2" title="verifyRecoveryLock"><a name="id531589"></a><h3>verifyRecoveryLock</h3><p>Default: 1</p><p>
+ </p></div><div class="refsect2" title="verifyRecoveryLock"><a name="idp141344"></a><h3>verifyRecoveryLock</h3><p>Default: 1</p><p>
Should we take a fcntl() lock on the reclock file to verify that we are the
sole recovery master node on the cluster or not.
- </p></div><div class="refsect2" title="DeferredAttachTO"><a name="id531602"></a><h3>DeferredAttachTO</h3><p>Default: 120</p><p>
+ </p></div><div class="refsect2" title="DeferredAttachTO"><a name="idp142232"></a><h3>DeferredAttachTO</h3><p>Default: 120</p><p>
When databases are frozen we do not allow clients to attach to the
databases. Instead of returning an error immediately to the application
the attach request from the client is deferred until the database
@@ -477,7 +477,7 @@
</p><p>
This timeout controls how long we will defer the request from the client
before timing it out and returning an error to the client.
- </p></div><div class="refsect2" title="HopcountMakeSticky"><a name="id531621"></a><h3>HopcountMakeSticky</h3><p>Default: 50</p><p>
+ </p></div><div class="refsect2" title="HopcountMakeSticky"><a name="idp3179992"></a><h3>HopcountMakeSticky</h3><p>Default: 50</p><p>
If the database is set to 'STICKY' mode, using the 'ctdb setdbsticky'
command, any record that is seen as very hot and migrating so fast that
hopcount surpasses 50 is set to become a STICKY record for StickyDuration
@@ -488,15 +488,15 @@
migrating across the cluster so fast. This will improve performance for
certain workloads, such as locking.tdb if many clients are opening/closing
the same file concurrently.
- </p></div><div class="refsect2" title="StickyDuration"><a name="id531641"></a><h3>StickyDuration</h3><p>Default: 600</p><p>
+ </p></div><div class="refsect2" title="StickyDuration"><a name="idp3181552"></a><h3>StickyDuration</h3><p>Default: 600</p><p>
Once a record has been found to be fetch-lock hot and has been flagged to
become STICKY, this is for how long, in seconds, the record will be
flagged as a STICKY record.
- </p></div><div class="refsect2" title="StickyPindown"><a name="id531655"></a><h3>StickyPindown</h3><p>Default: 200</p><p>
+ </p></div><div class="refsect2" title="StickyPindown"><a name="idp3182456"></a><h3>StickyPindown</h3><p>Default: 200</p><p>
Once a STICKY record has been migrated onto a node, it will be pinned down
on that node for this number of ms. Any request from other nodes to migrate
the record off the node will be deferred until the pindown timer expires.
- </p></div><div class="refsect2" title="MaxLACount"><a name="id531668"></a><h3>MaxLACount</h3><p>Default: 20</p><p>
+ </p></div><div class="refsect2" title="MaxLACount"><a name="idp3183408"></a><h3>MaxLACount</h3><p>Default: 20</p><p>
When record content is fetched from a remote node, if it is only for
reading the record, pass back the content of the record but do not yet
migrate the record. Once MaxLACount identical requests from the
@@ -504,13 +504,13 @@
onto the requesting node. This reduces the amount of migration for a
database read-mostly workload at the expense of more frequent network
roundtrips.
- </p></div><div class="refsect2" title="StatHistoryInterval"><a name="id531684"></a><h3>StatHistoryInterval</h3><p>Default: 1</p><p>
+ </p></div><div class="refsect2" title="StatHistoryInterval"><a name="idp3184584"></a><h3>StatHistoryInterval</h3><p>Default: 1</p><p>
Granularity of the statistics collected in the statistics history.
- </p></div><div class="refsect2" title="AllowClientDBAttach"><a name="id531697"></a><h3>AllowClientDBAttach</h3><p>Default: 1</p><p>
+ </p></div><div class="refsect2" title="AllowClientDBAttach"><a name="idp3185376"></a><h3>AllowClientDBAttach</h3><p>Default: 1</p><p>
When set to 0, clients are not allowed to attach to any databases.
This can be used to temporarily block any new processes from attaching
to and accessing the databases.
- </p></div><div class="refsect2" title="RecoverPDBBySeqNum"><a name="id531710"></a><h3>RecoverPDBBySeqNum</h3><p>Default: 0</p><p>
+ </p></div><div class="refsect2" title="RecoverPDBBySeqNum"><a name="idp3186272"></a><h3>RecoverPDBBySeqNum</h3><p>Default: 0</p><p>
When set to non-zero, this will change how the recovery process for
persistent databases ar performed. By default, when performing a database
recovery, for normal as for persistent databases, recovery is
@@ -521,7 +521,7 @@
a whole db and not by individual records. The node that contains the
highest value stored in the record "__db_sequence_number__" is selected
and the copy of that nodes database is used as the recovered database.
- </p></div><div class="refsect2" title="FetchCollapse"><a name="id531731"></a><h3>FetchCollapse</h3><p>Default: 1</p><p>
+ </p></div><div class="refsect2" title="FetchCollapse"><a name="idp3187824"></a><h3>FetchCollapse</h3><p>Default: 1</p><p>
When many clients across many nodes try to access the same record at the
same time this can lead to a fetch storm where the record becomes very
active and bounces between nodes very fast. This leads to high CPU
@@ -537,7 +537,7 @@
</p><p>
This timeout controls if we should collapse multiple fetch operations
of the same record into a single request and defer all duplicates or not.
- </p></div></div><div class="refsect1" title="LVS"><a name="id531761"></a><h2>LVS</h2><p>
+ </p></div></div><div class="refsect1" title="LVS"><a name="idp3190272"></a><h2>LVS</h2><p>
LVS is a mode where CTDB presents one single IP address for the entire
cluster. This is an alternative to using public IP addresses and round-robin
DNS to loadbalance clients across the cluster.
@@ -578,7 +578,7 @@
the processing node back to the clients. For read-intensive i/o patterns you can acheive very high throughput rates in this mode.
</p><p>
Note: you can use LVS and public addresses at the same time.
- </p><div class="refsect2" title="Configuration"><a name="id531812"></a><h3>Configuration</h3><p>
+ </p><div class="refsect2" title="Configuration"><a name="idp3194584"></a><h3>Configuration</h3><p>
To activate LVS on a CTDB node you must specify CTDB_PUBLIC_INTERFACE and
CTDB_LVS_PUBLIC_ADDRESS in /etc/sysconfig/ctdb.
</p><p>
@@ -601,7 +601,7 @@ You must also specify the "--lvs" command line argument to ctdbd to activate LVS
all of the clients from the node BEFORE you enable LVS. Also make sure
that when you ping these hosts that the traffic is routed out through the
eth0 interface.
- </p></div><div class="refsect1" title="REMOTE CLUSTER NODES"><a name="id531849"></a><h2>REMOTE CLUSTER NODES</h2><p>
+ </p></div><div class="refsect1" title="REMOTE CLUSTER NODES"><a name="idp3197376"></a><h2>REMOTE CLUSTER NODES</h2><p>
It is possible to have a CTDB cluster that spans across a WAN link.
For example where you have a CTDB cluster in your datacentre but you also
want to have one additional CTDB node located at a remote branch site.
@@ -630,7 +630,7 @@ CTDB_CAPABILITY_RECMASTER=no
</p><p>
Verify with the command "ctdb getcapabilities" that that node no longer
has the recmaster or the lmaster capabilities.
- </p></div><div class="refsect1" title="NAT-GW"><a name="id531887"></a><h2>NAT-GW</h2><p>
+ </p></div><div class="refsect1" title="NAT-GW"><a name="idp3200392"></a><h2>NAT-GW</h2><p>
Sometimes it is desireable to run services on the CTDB node which will
need to originate outgoing traffic to external servers. This might
be contacting NIS servers, LDAP servers etc. etc.
@@ -653,7 +653,7 @@ CTDB_CAPABILITY_RECMASTER=no
if there are no public addresses assigned to the node.
This is the simplest way but it uses up a lot of ip addresses since you
have to assign both static and also public addresses to each node.
- </p><div class="refsect2" title="NAT-GW"><a name="id531916"></a><h3>NAT-GW</h3><p>
+ </p><div class="refsect2" title="NAT-GW"><a name="idp3202792"></a><h3>NAT-GW</h3><p>
A second way is to use the built in NAT-GW feature in CTDB.
With NAT-GW you assign one public NATGW address for each natgw group.
Each NATGW group is a set of nodes in the cluster that shares the same
@@ -668,7 +668,7 @@ CTDB_CAPABILITY_RECMASTER=no
In each NATGW group, one of the nodes is designated the NAT Gateway
through which all traffic that is originated by nodes in this group
will be routed through if a public addresses are not available.
- </p></div><div class="refsect2" title="Configuration"><a name="id531938"></a><h3>Configuration</h3><p>
+ </p></div><div class="refsect2" title="Configuration"><a name="idp3204560"></a><h3>Configuration</h3><p>
NAT-GW is configured in /etc/sysconfigctdb by setting the following
variables:
</p><pre class="screen">
@@ -716,46 +716,50 @@ CTDB_CAPABILITY_RECMASTER=no
# become natgw master.
#
# CTDB_NATGW_SLAVE_ONLY=yes
- </pre></div><div class="refsect2" title="CTDB_NATGW_PUBLIC_IP"><a name="id531970"></a><h3>CTDB_NATGW_PUBLIC_IP</h3><p>
+ </pre></div><div class="refsect2" title="CTDB_NATGW_PUBLIC_IP"><a name="idp3207544"></a><h3>CTDB_NATGW_PUBLIC_IP</h3><p>
This is an ip address in the public network that is used for all outgoing
traffic when the public addresses are not assigned.
This address will be assigned to one of the nodes in the cluster which
will masquerade all traffic for the other nodes.
</p><p>
Format of this parameter is IPADDRESS/NETMASK
--
CTDB repository
More information about the samba-cvs
mailing list