[SCM] CTDB repository - branch 2.5 created - ctdb-2.5-4-g9381c33
Michael Adam
obnox at samba.org
Thu Nov 14 03:37:41 MST 2013
The branch, 2.5 has been created
at 9381c33dfd40192b7532d942059c2959dfae059d (commit)
- Log -----------------------------------------------------------------
commit 9381c33dfd40192b7532d942059c2959dfae059d
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Nov 7 16:01:49 2013 +1100
tests: Fix calling of ctdb tool from test
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 46615c8e0e63291605d76a6d35f1a93180718c36
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Nov 7 15:54:28 2013 +1100
Revert "tests: If transaction_start fails, try again"
This reverts commit ed7d999214ee009e480c26410a04fa105028cb8e.
This is not necessary since ctdb_transaction_start() now will return NULL
only when there is a failure and not when another transaction is currently
active.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 59489019ad15a5ad6b0f295e742fc9832745a842
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Nov 7 15:54:20 2013 +1100
client: Make g_lock_lock() wait till lock is obtained
This makes the behaviour of g_lock_lock() similar to that implemented in
Samba. Now ctdb_transaction_start() will return NULL only when there are
failures and not when another transaction is active.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 370022e1ff654db99d0c3ce0c49914c249e57289
Author: Srikrishan Malik <srimalik at in.ibm.com>
Date: Thu Oct 31 11:54:58 2013 +0530
eventscript: Fix link creation failure if the link already exist but the target path is missing
Signed-off-by: Srikrishan Malik <srimalik at in.ibm.com>
commit 30a6565a7b476516f3daed0669b5650e1be3cd18
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Oct 16 11:46:54 2013 +1100
doc: Update NEWS
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit a7a844e7600b59d876de94ec5bf7bd1647508cdf
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 30 13:22:21 2013 +1100
web: Add links to new manpages
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 15b5c6c00c248bc1a8364a6da103296a55d7bfb6
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Sep 23 16:26:16 2013 +1000
doc: Major updates to manual pages
This includes new manpages for ctdb.7, ctdb.conf.5 and ctdb-tunables.7.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit ca5fc3431573c44d55d09d987c715fb53756fc1f
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 30 12:37:15 2013 +1100
tunables: Remove obsolete tunables
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit afd9b51644af074752d74c412cb4e7ec2eba2c69
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Oct 30 12:17:37 2013 +1100
recoverd: Rebalancing should be done regardless tunable
Rebalance target nodes should be set even if a deferred rebalance is
not configured. The user can explicitly cause a takeover run.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 275ed9ebe287e39d891888c13810c70f347af8ac
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Oct 30 11:32:28 2013 +1100
recoverd: Improve an error message in the election code
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c8b542e059a54b8d524bd430cad9d82e5edd864d
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 29 16:38:42 2013 +1100
Revert "if a new node enters the cluster, that node will already be frozen at start"
This is unnecessary due to 03e2e436db5cfd29a56d13f5d2101e42389bfc94.
Furthermore, if a node doesn't force an election but wins it then it
can fail to record that it is the new recovery master. This can lead
to a reverse split brain where there is no recovery master.
This reverts commit c5035657606283d2e35bea40992505e84ca8e7be.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
Conflicts:
server/ctdb_recoverd.c
commit eb8ec5681bfccb26c8ffae72952d54bb0ba46249
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 29 14:05:41 2013 +1100
ctdbd: When a node is connected, log at DEBUG NOTICE not DEBUG_INFO
This is important enough that we should see it when the log level is
DEBUG_NOTICE.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d1674aad224f8f0c9a03c3cd38a647318ba0f03e
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 28 16:20:44 2013 +1100
tests/complex: Remove CTDB_NFS_SKIP_SHARE_CHECK test
This is a needlessly complex way of testing the same thing as the
eventscripts unit tests 60.nfs.monitor.161.sh and
60.nfs.monitor.162.sh.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 81b94fbb7495ac3204f1a84c673c8babf04663bc
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 28 16:14:40 2013 +1100
tests/complex: Remove CTDB_SAMBA_SKIP_SHARE_CHECK test
This is adequately covered by eventscripts unit tests
50.samba.monitor.105.sh and 50.samba.monitor.106.sh.
This test is broken if CTDB_SAMBA_CHECK_PORTS is not specified in the
CTDB configuration. Fixing it is hard and involves adding a more
complex stub for testparm. We already have that in the eventscript
unit tests above.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 8c6f511254ecb0381a609b37e3a0ee6e5ec5d562
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 28 16:00:54 2013 +1100
eventscripts: Rewrite the smb.conf cache file handling
The background update is never guaranteed to complete before the cache
is used, so don't bother trying it at the beginning. Instead, put a
timeout on a foreground update.
If the foreground update fails:
* If there's no available cache file then die.
* If there is a previous cache file then use it and log a warning.
* Do a background update at the end of the monitor event.
Also remove commas in the "smb ports" list before use, since (newer?)
testparm seem to insert commas into the default value. Update the
associated test to add a comma.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit c072eb1f6488f94f83a6d3a81d88bf29ad866943
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Oct 25 16:25:25 2013 +1100
tools/ctdb: Fix documentation string for ban command
Ban time of 0 is not supported.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3e41170c78fc7a2bf526129c9b7db3739b61c6bf
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 24 11:13:16 2013 +1100
Revert "recoverd: Disable takeover runs on other nodes for 5 minutes"
5 minutes is too long to leave the cluster in limbo if the recovery
daemon dies during a takeover run, even though this is quite unlikely.
We need a new recover master to be able to do takeover runs fairly
quickly.
This reverts commit 71080676bb4acbd0d9b595a30cf7fe6dddbf426f.
commit 01a46205c3a3d6609dc0b0324319b89667dffa32
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 24 14:15:53 2013 +1100
tools/onnode: Fix healthy/ok node handling
This bit-rotted a long time ago when the "ThisNode" column was added
to "ctdb -Y status" output. The fake "ctdb -Y status" output in the
test was never updated to reflect this change.
Instead of making sure that all columns are "0", just check that
they're not "1". This implicitly ignores "Y" and "N" in this
"ThisNode" column without having to do anything else clever.
Also update associated tests. The main "ctdb ok" test had a duplicate
opening line for a here document, which was tickled by this change.
This fixes samba bz#8122.
Signed-off-by: Martin Schwenke <martin at meltin.net>
onnode test fixup
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 56486d1c01cc8ad0e4b8cee7a22429e72e50f03d
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Oct 28 18:49:51 2013 +1100
daemon: Change the default recovery method for persistent databases
Use sequence numbers to do recovery for persistent databases instead of
RSNs. This fixes the problem of registry corruption during recovery.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit c7450f9e22133333bf82c88a17ac25990ebc77ab
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 23 15:37:41 2013 +1100
packaging: Create runtime directories for CTDB
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit b63f6fd2d295c8e18cbf3420ab05fce07b727f31
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Oct 23 11:28:26 2013 +1100
initscript: Update systemd configuration to put PID file in /run/ctdb
Elsewhere we're moving the socket to /var/run/ctdb. We might end up
with PID files and sockets for other daemons later, so let's call the
directory "ctdb" instead of "ctdbd".
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit dc67a4e24af9d07aead2a1710eeaf5d6cc409201
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Oct 3 15:19:05 2013 +1000
build: Move the default CTDB socket from /tmp to /var/run/ctdb
Use /var/run/ctdb/ctdbd.socket because there might be other daemons
that need sockets in the future.
The local daemons test code to create a link for the default
convenience socket has to be removed because the link can't be created
as a regular user in the new location. This should be OK since all
calls to the ctdb tool in the test code should be wrapped in onnode.
When debugging tests, a developer will have to set CTDB_SOCKET by
hand.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-programmed-with: Martin Schwenke <martin at meltin.net>
commit 2c09aac71188f43cd592572b10ea30b7a2969678
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Oct 3 15:47:30 2013 +1000
packaging: Move ctdb/ directory from /var to /var/lib
Introduce CTDB_VARDIR variable that points to /var/lib/ctdb by default.
This makes CTDB_VARDIR consistent across C code and scripts.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 1fe82f3d7b610547ff4945887f15dd6c5798a49b
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 21 19:36:36 2013 +1100
ctdbd: Simplify database directory setting logic
No need to check if the options are set. The options are always set
via static defaults.
No need to talloc_strdup() the values via wrapper functions. The
options aren't going away. Remove now unused ctdb_set_tdb_dir() and
similar functions.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit d73d84346488a2ed54e6a86f9d7ec641c8e33ace
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 21 19:36:36 2013 +1100
ctdbd: Remove duplicate database directory setting logic
Defaults for ctdb->db_directory and similar variables are currently
set in 2 places.
Change this to set them in only 1 place and make the directories at
initialisation time instead of waiting until later.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 7b971df79b0b63f83555205eacf48d49ca3a273a
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 21 19:29:39 2013 +1100
common: New function ctdb_mkdir_p_or_die()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit afe2145d91725daf1399f0a24f1cddcf65f0ec31
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 21 19:08:52 2013 +1100
common: New function mkdir_p()
Behaves like mkdir -p.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit b9b9f6738fba5c32e87cb9c36b358355b444fb9b
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Oct 3 15:13:41 2013 +1000
tcp: Create socket lock in /var/run/ctdb instead of /tmp
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-programmed-with: Martin Schwenke <martin at meltin.net>
commit 6a5469a63547029f4fc704a4d4075543e06c36d1
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Oct 24 14:26:12 2013 +1100
doc/examples: Add CTDB configuration examples
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit a0b965bb73777dde7a4abf80c5c4742581bce520
Author: Mathieu Parent <math.parent at gmail.com>
Date: Thu Aug 29 08:20:05 2013 +0200
Add missing $remote_fs LSB dependency
commit cea81bdd503f6ef8b5bbd3582a8e0085bb02bc9f
Author: Mathieu Parent <math.parent at gmail.com>
Date: Thu Aug 29 07:42:12 2013 +0200
Improved check_ctdb
- increase verbosity with "-v"
- concat error messages (if there are several)
- handle 255 return code as warning (as it is the return code when any of the node is missing)
- read /etc/ctdb/nodes remotely (ctdb_check can be run on a non-ctdb host)
commit 1f6cc8764e28058c56d0350147032b6e30cb355d
Author: Mathieu Parent <math.parent at gmail.com>
Date: Thu Aug 15 20:23:57 2013 +0200
Add missing events.d/99.timeout
commit 58ca2c3e7e3a27023ad86660f01a2052e2a19635
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Oct 24 14:37:41 2013 +1100
eventscripts: Instead of listing all tunables, query EventScriptTimeout
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 1f327401f2e181780937aa3f6c479376ff787f3f
Author: Michael Adam <obnox at samba.org>
Date: Wed Oct 23 00:46:34 2013 +0200
ctdb_client.h: fix build on AIX by removing C++-style comments
Reported by John P Janosik <jpjanosi at us.ibm.com>
Signed-off-by: Michael Adam <obnox at samba.org>
commit a3d63a9db89d08bb284b3b3a6db773422f21b477
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 21 19:52:01 2013 +1100
ctdbd: Pass the public address file location in ctdb context
No need to pass it as an extra argument to ctdb_start_daemon.
Also ensure options.public_address_list gets a nice static default.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c11803e3dcc905a45a08d743595e63f9ca445f0d
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 1 15:13:29 2013 +1000
ctdbd: Debug locks by default with override from enviroment variable
Default is debug_locks.sh, relative to CTDB_BASE.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 21b4d1aba00902f1eee0cbf4f082b0794fd5b738
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 15 14:10:58 2013 +1100
ctdbd: Default for event_script_dir should use CTDB_BASE
Also get rid of ctdb_set_event_script_dir(). It creates an
unnecessary copy of something that will be around for the lifetime of
the process.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 20e705e63bd3b20837cc3ac92fdcf2a9650ccfc8
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 21 19:33:10 2013 +1100
ctdbd: Add nodes_file member to struct ctdb_context
This allows ctdb_load_nodes_file() to move to ctdb_server.c and
ctdb_set_nlist() to become static.
Setting ctdb->nodes_file needs to be done early, before the nodes file
is loaded. It is now set from CTDB_BASE instead ETCDIR, so setting
CTDB_BASE also needs to be done earlier.
Unhack ctdbd_test.c - it no longer needs to define
ctdb_load_nodes_file().
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 2b6dc0d2799f3563b767622b6f9246450aa4036b
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 21 19:43:47 2013 +1100
tools/ctdb: CTDB_BASE is the default location of configuration files
Ensure that environment variable CTDB_BASE is set.
Update defaults for nodes and natgw_nodes to use CTDB_BASE.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 30ca419aa1c78008f81839497921bbfba480e7fc
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 15 14:02:31 2013 +1100
ctdbd: Don't check CTDB_BASE before setting it, just don't override
That's what the 3rd argument to setenv(3) is for... :-)
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 913f229508302378212678d98c22606a4954b09c
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 22 15:36:30 2013 +1100
tests/integration: Pass --valgrinding option when running under valgrind
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 1c0a627df1b510f49c65ffeb4474240c8856cdf2
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 21 19:42:32 2013 +1100
ctdbd: Fix some errors in the popt configuration
That 4th argument isn't a default or similar, so consistently make it 0.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 30d9b634b16c3cc740e5e453ea5c21012b1fde88
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Oct 18 16:43:26 2013 +1100
initscript: New configuration variable CTDB_DBDIR_STATE
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 516cdea0e73cf3f63b3303e22809834c8cbc64e4
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Oct 18 13:24:03 2013 +1100
scripts: Make detect_init_style() more readable
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 45e2bc66abf9fcfeadcc279a656ed7fd1838920a
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 17 16:44:24 2013 +1100
eventscripts: Rework the iSCSI eventscript
* It should run on "ipreallocated" instead of "recovered"
* Variable name NODE -> ip since that's what it is
* Simplify some logic
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 1152215fc69217e4292762e28d193b7ea0e06ee3
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 17 16:20:18 2013 +1100
eventscripts: Don't update static routes on "recovered" event
Routes only need to be updated when IPs have moved. IP takeover runs
will generate "ipreallocated", which is enough. "recovered" always
follows "ipreallocated" anyway, so avoid the redundancy.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 542c70d6281d636ecd51502fbbf219f418bfac66
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 17 16:17:26 2013 +1100
eventscripts: NAT gateway script doesn't need to handle "recovered" event
Any time a node changes flags in any significant way there will be a
takeover run, which will generate an "ipreallocated" event. The
"recovered" event always happens straight after a takeover run so we
update the NAT gateway twice.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 00736a21fc268c10b6a718731e56b3dbb7e60554
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 17 16:14:14 2013 +1100
eventscripts: Delete placeholder "recovered" and "shutdown" events
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 2ea9d3acfe7e8665685f54294f5edc9b8ffc2f3f
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 17 16:13:21 2013 +1100
eventscripts: Clean up comment at the top of 00.ctdb
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 41df1637c1d8a7b2f5a9974408db71b1f74cb2f2
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 17 16:00:39 2013 +1100
eventscripts: Remove reconfigure check from samba and winbind eventscripts
There is no reconfigure code for these scripts so no need to check for
reconfiguration.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5b77fd95bda5f1960aca952e1b759231890b56f3
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 17 15:58:25 2013 +1100
eventscripts: Remove reconfigure code from httpd eventscript
Nothing ever (or has ever) set the "needs reconfigure" flag, so this
code is unnecessary.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 044d302b41a2040642355401e3236fcecc3a620a
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 17 15:23:35 2013 +1100
eventscripts: Fold ctdb_check_tcp_ports_ctdb() into ctdb_check_tcp_ports()
A generic framework is no longer needed now that the "ctdb" checker is
the only one left. Simplify the code.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 50e330d0679614bee2e7bab028436e929f74ca50
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 17 11:02:54 2013 +1100
eventscripts: Remove TCP port checks other than the built-in CTDB one
"ctdb checktcpport" is no longer experimental so the other checkers
are no longer required.
Remove tests related to the removed checkers.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit cfbff39e22e42f3997f637290748290833525714
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 17 10:52:00 2013 +1100
scripts: Remove setting of PATH from functions file
The current setting is inconsistent with settings on most systems,
putting /bin before /sbin. Use of /usr/local/bin, which may be
required on some systems, is also overridden. This can make it
difficult to do interactive debugging of script problems.
Rely on the system PATH instead.
If system-specific changes need to be made then this can be done in a
configuration file.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 9437d4809bfbbb5c6a32a610665333d2f641881d
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 17 10:39:09 2013 +1100
tests/eventscripts: Run scripts under sh by default
Some scripts are disabled by default so are no executable. Explicitly
running them under sh allows them to be run without having to mess
around and make them executable or similar.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 212d4b201c30804f69cffe4b7150d4b74bf2e54f
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 15 16:44:45 2013 +1100
tests/eventscripts: New tests for 20.multipathd
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 49f077c475b078889ff0492fe7d567a64d6cb87c
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 15 16:42:45 2013 +1100
eventscripts: Clean up 20.multipathd
Reduce the complexity, including the depth of background processes.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e574b30257126679704b088c4334a8e7a53a9c3f
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 15 12:00:13 2013 +1100
eventscripts: NAT gateway script should export CTDB_NATGW_NODES
Otherwise calls to "ctdb natgwlist" will not behave as expected if a
non-standard file is used, since that command will use the default
file location.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 79e2029f9bc078126e865aa715100a3870c7604b
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 15 11:57:28 2013 +1100
scripts: Simplify script_log() to just look at CTDB_SYSLOG variable
The old logic was actually wrong. If CTDB_LOGFILE is unset then a
default is used, not syslog.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e55f3a1577eff0182802b0341d865d961aeae1c7
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 15 11:54:58 2013 +1100
scripts: Remove support for CTDB_OPTIONS configuration variable
Allowing people to put random options in CTDB_OPTIONS complicates some
logic (particularly around use of syslog). If we're going to have
variables for options then let's make sure we have a variable for each
option and make people use them.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit bda0da41aaf629a252cc361b73ebc5328f26ed04
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 15 11:31:12 2013 +1100
scripts: Remove unused configuration variable CTDB_MANAGES_SCP
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f12658aff125996ae45eea23241d8c3d0567b893
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 15 11:29:23 2013 +1100
eventscripts: Deprecate NFS_SERVER_MODE, use CTDB_NFS_SERVER_MODE instead
All CTDB configuration variables should start with CTDB_.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 4a5d5935f4410a93a3343d85a24dbcddae2c4c20
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 14 13:54:39 2013 +1100
recoverd: Remove function reload_nodes_file()
It is a 1 line wrapper around ctdb_load_nodes_file(), so use that
instead. We need less code... :-)
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 25fd05505f61dc595c0ef25bb6e332274d5530e8
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 14 12:50:08 2013 +1100
Revert "null out the pointer before we reload the nodes file"
This reverts commit 4b0f32047e8bece0a052bdbe2209afe91b7e8ce3.
This is not necessary. It just causes a memory leak.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f3413fb8b90c4d9f0c2c2a69825c66d080117193
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Oct 11 15:53:40 2013 +1100
client: Fix a format string argument compiler warning
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 484c46eaae056480baf050fd91868f2fd0537985
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Sep 27 18:02:39 2013 +1000
recoverd: Ignore failed flag updates on inactive nodes
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-programmed-with: Martin Schwenke <martin at meltin.net>
commit 7764cf67a61bbf1caad5aa8e2d75a262b9da654c
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Sep 26 18:47:27 2013 +1000
common/util: Use AIX specific code for setting high priority for CTDB daemon
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit b9af66032f3d96f2fe12b7a4fcc5e71d4a282365
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Oct 11 15:09:11 2013 +1100
git: Ignore generated documentation files
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 63924ff372b066cd878b79e71f06de4c24c814a2
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Oct 11 15:05:54 2013 +1100
tests: When running local tests with run_tests.sh, use fixed TEST_VAR_DIR
Otherwise we end up with lots of useless temporary directories.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0a79ba2f1277a776347e2c3f04ce8419e0be62de
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Sep 26 20:58:50 2013 +1000
eventscripts: Fix comment - CTDB_TCP_PORT_CHECKS -> CTDB_TCP_PORT_CHECKERS
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d0dec5b8e60316701fdd02150c4dd8f01aacbfda
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Sep 23 16:24:46 2013 +1000
tests/integration: Tweak ctdbd startup options
* --public-interface is not needed
* Add --sloppy-start to speed up restarts
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 588172bcb6bf267339e2bd09e23d2c4904a27a41
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Sep 26 13:11:04 2013 +1000
recoverd: Fix the VNN lmaster consistency check
It does cope with node that don't have the lmaster capability.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit ed7d999214ee009e480c26410a04fa105028cb8e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Oct 1 11:54:35 2013 +1000
tests: If transaction_start fails, try again
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit af4b6b8b3222d2a3c425fcc6833db579d0cd7ffa
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Oct 1 11:53:57 2013 +1000
tests: Make sure test exits with zero status on successful completion
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 929045335212e825deb645cc6c7f97b8a40fdbb3
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Sep 27 11:26:27 2013 +1000
tests: Re-enable transaction test code
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 14bfd22fad1a5fd27eede1be7fccbaed9466e13e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Sep 24 13:10:31 2013 +1000
tools/ctdb: Remove setdbseqnum command
This command was added to test persistent database recovery with sequence
numbers. With the new persistent transaction code, sequence numbers get
updated automatically, so there is no need for this command.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 961dd5d0acbb971756944ea9f69992020ea7d9fc
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Sep 24 13:08:48 2013 +1000
tests: No need to set sequence number when modifying persistent database
With the new persistent transaction code, sequence numbers will be
automatically updated whenever a record is updated.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 41bdbcfd72092cdd25da87e60689c087bca97933
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Sep 25 19:16:53 2013 +1000
client: Remove old persistent transaction code
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 4e0f1971792c9431d8d51dc57d54ecc9e4576dd5
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Sep 23 18:30:04 2013 +1000
client: Reimplement persistent transaction code using TRANS3_COMMIT
Implementing persistent trasnaction code from Samba.
Persistent transaction code was reimplemented in Samba using g_lock.tdb
to hold transaction locks and using TRANS3_COMMIT control.
Implementation details:
1. When starting a transaction, create a record with "transaction-<dbid>"
as key and store current server_id in the structure.
2. If a record already exists, some other client has already started a
transaction. Verify that the process corresponding to server_id stored
in the record really exists or it's a stale record and overwrite it.
3. All modifications to the actual persistent database are stored in a
marshal buffer.
4. When transaction is committed, read the sequence number of the
persistent database and increment it. Sequence number record is also
stored in the marshal buffer.
5. Send the changed records (marshal buffer) in TRANS3_COMMIT control
to all the active nodes.
6. If all controls succeed, verify that the sequence number has been
incremented. Commit is successful. If any of the controls fail,
abort the transaction.
7. In case sequence number has not yet been incremented, then database
recovery has been triggered. So repeat from step 5.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 40589ae5259880431f358250c1f0d07bcaa21d1f
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Oct 4 15:38:04 2013 +1000
client: Add functions to parse g_lock.tdb records
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 55f91ea4373c54ddb5faad87fa2826d86a4b6172
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Oct 4 15:37:24 2013 +1000
client: Add functions to handle server_id structure
server_id records are stored in g_lock.tdb for persistent transactions.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 22a253b7ccf1ff854cddf0b67969dc84d7d6a654
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Sep 12 16:43:43 2013 +1000
ctdbd: Remove transaction code related to TRANS2 commits
This removes data types and structure elements related to TRANS2
persistent transaction code.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 7d176352986317e63696d74252ff5d8eccb2fee5
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Sep 12 16:27:39 2013 +1000
ctdbd: Deprecate TRANS2 commit controls
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 3c892ea1b5aa42686adb82ce29b9fcfdf9d204a1
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Sep 12 16:36:09 2013 +1000
ctdbd: Create a utility function to log error for "not implemented" controls
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 2ce3a48cc969d563c26dd295723416c0d7b077a2
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Sep 12 16:35:17 2013 +1000
include: Remove unused set_dmaster structure
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 6182bd0c19f215a997efe5272e633b1b1bd0c882
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Sep 18 14:27:03 2013 +1000
tests/tool: Remove references in libctdb in file and function names
Main changes are:
libctdb_test.c -> ctdb_test_stubs.c
ctdb_tool_libctdb.c -> ctdb_functest.c
ctdb_tool_stubby.c is gone, replaced with existing ctdb_test.c.
Functions starting with "libctdb_test_" now start with
"ctdb_test_stubs_".
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 10aac42f30cc0d56dca42ece17d04ccbc321056d
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Sep 18 14:01:00 2013 +1000
tests/tool: Rework test programs so they no longer expect libctdb
Instead, override controls using preprocessor magic.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 59bd4ede15a5958b87e0d253461eb9111885bd2f
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Sep 18 13:43:53 2013 +1000
tests/tool: Fix some comment typos
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3296559c43e70f755fcf2c06677891e0319c8142
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Sep 18 13:40:52 2013 +1000
tools/ctdb: Stop return value from being clobbered in control_lvsmaster()
ret is initialised too early and is clobbered by the call to
ctdb_ctrl_getcapabilities(). Initialising it later means that the
function returns -1 when no LVS master is found.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5619754343003016ede27014567dbb4701f97928
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Sep 18 13:40:10 2013 +1000
client: Fix some format string compiler warnings
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 299fa487549e36572b757852d21471f9e23f6e8f
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 30 23:38:15 2013 +1000
common: Fix setting of debug level in the client code
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit c5a7f2b4ff011e1393c4ff34864f85e6b472ff07
Author: Amitay Isaacs <amitay at gmail.com>
Date: Sun Aug 25 21:44:59 2013 +1000
libctdb: Remove incomplete libctdb
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 1585a8e275b0143e5e46311b3d5e9785119f735f
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Aug 27 14:46:08 2013 +1000
tools/ctdb: Pass memory context for returning nodes in parse_nodestring
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit ae0d8f432ef98a72c85a6cd42c503b718bef0e4e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Sun Aug 25 21:43:29 2013 +1000
tests: Do not use libctdb code in tests
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit cd66282c635cf53386d8970b89c895076ea21cbd
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Aug 29 17:22:38 2013 +1000
tools/ctdb: Do not use libctdb for commandline tool
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 8cb1fbbfe88327c9c7ab68e8eded586dff611e57
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 23 16:52:24 2013 +1000
client: Add ctdb_ctrl_getdbseqnum() function
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 1e7fca5cdc1d7205cf084e35aace1a5dc46ea294
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 23 16:52:02 2013 +1000
client: Add ctdb_ctrl_getdbstatistics() function
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit c9a9d14c91f203ce964a426a8a1e2c1715af2098
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 23 16:51:26 2013 +1000
client: Add ctdb_client_check_message_handlers() function
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 962eb63c6d500e29a03ae087757d81be449888c6
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 23 16:49:46 2013 +1000
client: Remove extra whitespaces
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 873b9cadbcc363a9e5f450b0a1feb1cf2ce1e6c9
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 23 17:21:24 2013 +1000
tests: Remove unused test program ctdb_fetch_lock_once
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit d94a10f93a0925b17458d009e604966666b3d880
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Aug 29 16:58:47 2013 +1000
tools/ctdb: When printing TDB data as a string, use correct length of the string
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 8b238852884004a56f76a1762199c338864d1249
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 23 16:57:40 2013 +1000
tools/ctdb: Remove un-implemented ctdb vacuum command
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 713c9ecc791e3319a2d109838471833de5a158c8
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Sep 25 19:10:13 2013 +1000
tests: Add a simple test to test cluster wide database traverse
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 37e22fc3ac3eb64732f2e67058f5b7b06c093fbf
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Sep 9 12:46:26 2013 +1000
traverse: Send traverse end record from traverse child process
Traverse records are sent directly from traverse child process, but
the last empty record signalling end of traverse is sent from ctdbd.
This creates a race condition between ctdbd and traverse child.
There are two fds from traverse child to ctdbd - a pipe to track status
of the child process and unix socket connection for sending records.
It's possible that last few records are sitting in unix socket buffer
when ctdbd reads the status written from traverse child. This will
be interpreted as end of traverse and ctdbd will send the last empty
record to originating node before it has processed the pending packets
in unix socket connection.
The race is avoided by sending the last empty record marking end of
traverse from the child process.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 482ac708cb79cb6378d814a79c2cf13f88435bc4
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Sep 10 17:52:26 2013 +1000
traverse: Wait till all data has been flushed from output queue
To improve the traverse performance, records are directly sent from
traverse child process to the originating node. Make sure that all the
data is sent via socket, before informing ctdbd that traverse is complete.
Without waiting for all the packets to be flushed from the queue,
child process can incorrectly signal ctdbd that traverse has ended.
This will cause the pending records in the queue never to make it to
the originating node and traverse information will not be complete.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 25e9cf86328252f96215b54b94551dd7bbdd2db4
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Sep 13 13:28:31 2013 +1000
traverse: Use ctdb local variable for convenience
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit abd51a9f41ebb178c4ea4491bdedf9a9433e7232
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Sep 6 18:11:40 2013 +1000
traverse: Check if local traverse failed or succeeded
By passing the result of tdb_traverse_read() allows ctdbd to determine
if the local traverse succeeded or not. In case of a problem with local
traverse, ctdbd can log an error.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit e4aba8598b00a810e721de64ac44dccc9af04ab6
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Sep 6 14:51:54 2013 +1000
traverse: Log information when traverse starts and ends
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 9e18f3c173863919587e25d704f66372624ed8ed
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Sep 23 16:23:36 2013 +1000
tool/ltdbtool: -h option does not require an argument
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 8f660d0dd52013e5876806be908e8e603aa6e968
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Sep 23 16:22:36 2013 +1000
scripts: Add support for optional ctdbd.conf configuration file
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit c700dd0c7b6b43b61b3e231643b5d7cbe2f9592a
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Sep 23 16:21:30 2013 +1000
utils: Make debug level strings case-insensitive
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 49c87699fad151933a0aefebfee968fc850e6383
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Sep 23 16:20:42 2013 +1000
tools/ctdb: Fix help messages for ctdb commands
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c8a6e5ce579e2fe320c40268e7e9ddfe68b8cd30
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Sep 23 16:19:52 2013 +1000
tools/ctdb: Ban time of 0 is invalid
Apparently it used to mean a permanent ban but it is unclear if this
was ever supported.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ff41ce5ef202f8f6342e285d195bb5df61d848ce
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Sep 16 14:35:13 2013 +1000
eventscripts: Load CTDB configuration settings in 70.iscsi
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 71080676bb4acbd0d9b595a30cf7fe6dddbf426f
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Sep 18 17:07:32 2013 +1000
recoverd: Disable takeover runs on other nodes for 5 minutes
60 seconds might not be long enough to kill all connections and
release IPs.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b39aa2e401fbb581207d986bac93778e9c01acdc
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Sep 18 17:06:16 2013 +1000
recoverd: Improve logging for takeover runs
Takeover runs are currently silent when they succeed. However, they
are important, so log something by default.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 6d44657a5e5b0df22bab2d487a503dd1c5ba79b4
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Sep 18 16:35:18 2013 +1000
tools/ctdb: Use the standard long timeout when disabling takeover runs
This means that takeover runs will be disabled for about as long as the
reloadips control can take to complete.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0846c00597adb66bba8c9dbf63443d0c2f91a7d1
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Sep 6 13:20:26 2013 +1000
tools/ctdb: Fix arguments/semantics of rebalance node
There's no reason why specifying a node should be compulsory. This is
a cluster-wide operation because it is implemented by the recovery
master so multiple nodes should not be specified using -n. However,
the command should be able to specify multiple nodes so let it have
its own nodestring argument.
This change should be backward compatible with the old requirement of
specifying a single node via -n.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ac946ee4ad01b1e5cd1006930b9f8a190a0a58ba
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Sep 6 13:19:09 2013 +1000
tools/ctdb: Make rebalancenode more robust
Use a broadcast instead of trying to win the race of determining the
recovery master and then sending the message before the recovery
master changes.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d921b2756d5f1c4ad7a35fe120f6fda9f5bf5686
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Sep 6 11:29:14 2013 +1000
tests/simple: Fix the reloadips test to cope with changes to reloadips
Specifying nodes to reload no longer uses -n.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e81589b7084c661adf617e166cc2c25b4939f841
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Sep 6 11:23:07 2013 +1000
recoverd: Be careful about freeing the list of IP rebalance target nodes
It can change during a takeover run. If it does then don't free it.
There are potentially fancier solutions (e.g. check what PNNs are new
to the list) to this issue but this is the simplest.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ceb30432a9a550778aed0b422a654fc5287b82a3
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Sep 6 11:21:10 2013 +1000
recoverd: reloadips should rebalance target nodes for new IPs
Otherwise, if existing IPs are added to extra nodes (that have,
perhaps, been disconnected) then those IPs will not be rebalanced
across the extra nodes.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 85a5b544ec032173e98c9cc3b5402a76b961aa3b
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Sep 5 15:56:51 2013 +1000
ctdbd: Make ctdb_reloadips_child send controls asynchronously
Deleting IPs can take a while because IPs are released and connections
are killed. This can take a while so do them in parallel. In fact,
since the set of IPs being added and deleted will be disjoint, send
all the adds/deletes at the same time and then wait.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c51c1efe5fc7fa668597f2acd435dee16e410fc9
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Sep 4 14:30:04 2013 +1000
recoverd: Fix the implementation of CTDB_SRVID_REBALANCE_NODE
The current implementation has a few flaws:
* A takeover run is called unconditionally when the timer goes even if
the recovery master role has moved. This means a node other than
the recovery master can incorrectly do a takeover run.
* The rebalancing target nodes are cleared in the setup for a takeover
run, regardless of whether the takeover run succeeds.
* The timer to force a rebalance isn't cleared if another takeover run
occurs before the deadline. Any forced rebalancing will happen in
the first takeover run and when the timer expires some time later
then an unnecessary takeover run will occur.
* If the recovery master role moves then the rebalancing data will
stay on the original node and affect the next takeover run to occur
if the recovery master role should come back to the original node.
Instead, store an array of rebalance target nodes in the recovery
master context. This is passed as an extra argument to
ctdb_takeover_run() each time it is called and is cleared when a
takeover run succeeds. The timer hangs off the array of rebalance
target nodes, which is cleared if the node isn't the recovery master.
This means that it is possible to lose rebalance data if the recovery
master role moves. However, that's a difficult problem to solve. The
best way of approaching it is probably to try to stop the recovery
master role from jumping around unnecesarily when inactive nodes join
the cluster.
The long term solution is to avoid this nonsense completely. The IP
allocation algorithm needs to cache state between runs so that it
knows which nodes have just become healthy. This also needs recovery
master stability.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 4cd727439a0824ebb8dbcf737d9888ffc3c41184
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Aug 28 15:46:27 2013 +1000
recoverd: Remove unused CTDB_SRVID_RELOAD_ALL_IPS and handler
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d66a072d9b120c78c47e726e9f29a3c1cfdd87ce
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Aug 28 15:38:48 2013 +1000
tools/ctdb: Reimplement reloadips
This implementation disables takeover runs on all nodes before trying
to reload IPs. It also takes "all" or the list of PNNs as an argument
to the command instead of to -n. -n can still be specified with a
single node indicating that node should be considered the current node
- that might be confusing so could be removed.
This implementation does not use CTDB_SRVID_RELOAD_ALL_IPS, so it can
be removed.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 428f800bcdf3dbfe19de8bb36099fbf01ebeaab4
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Aug 28 11:50:23 2013 +1000
recoverd: Defer ipreallocated requests when takeover runs are disabled
The takeover run will fail anyway but deferring seems like a cleaner
option.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0a51a85915486b2a8fded7ba6444b18c6c1ee8e8
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Aug 28 11:32:54 2013 +1000
recoverd: Reimplement CTDB_SRVID_DISABLE_IP_CHECK
Use disable_takeover_runs_handler() instead of maintaining duplicate
logic.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 00db4de53a0d86013e79e6577e7e6cf3ef864e56
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Aug 27 15:04:40 2013 +1000
recoverd: New SRVID message CTDB_SRVID_DISABLE_TAKEOVER_RUNS
This implements a superset of CTDB_SRVID_DISABLE_IP_CHECK. It stops
the IP checks but also causes any attempted takeover runs to fail and
be rescheduled.
This is meant to completely stop IP movements.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 52050e1c75b21961dafe2bc410268b44240ab24e
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 16 18:47:51 2013 +1000
tools/ctdb: Add a wait_for_all option to srvid_broadcast()
This will be useful for other SRVIDs.
The error checking in the handler depends on the SRVID responding with
a uint32_t where <0 indicates an error and >=0 is a PNN that
succeeded.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a566fb5e70282c4e9f76654b1be4dc80829dced0
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 16 17:06:23 2013 +1000
tools/ctdb: Factor out SRVID broadcast code from ipreallocate()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c58ee0eddf7ae3283e3ca8bd25575e6e677e1b17
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 16 16:25:28 2013 +1000
tools/ctdb: Change ipreallocate() to use a local done flag
Instead of the current global variable. This is in anticipation of
abstracting the code.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e4eae6e3291baa299a1d0f733ab11b138ee699a3
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 16 20:02:34 2013 +1000
recoverd: Factor out the SRVID handling code
The code that handles IP reallocate requests can be reused.
This also changes the result back to a SRVID caller to the PNN on
success or a negative error code on failure. None of the callers
currently look at the result so this is harmless... but it will be
useful later.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d9c22b04d5aa7938a3965bd3144568664eb772ce
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 16 20:10:10 2013 +1000
recoverd: Make the SRVID request structure generic
No need for a separate one for each SRVID.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 48b603fbf16311daa47b01e7a33d477ed51da56d
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Sep 3 11:21:09 2013 +1000
recoverd: Move disabling of IP checks into do_takeover_run()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 8ed29c60c0a7dd29f2a6efdf694d38e94281e1c4
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Sep 3 11:20:01 2013 +1000
recoverd: do_takeover_run() should mark when a takeover run is in progress
Nested takeover runs should never happens so they should fail.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e5f94c7857405bdeac233069003c3769b3dc3616
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Aug 27 12:19:18 2013 +1000
recoverd: takeover_fail_callback() doesn't need to set rec->need_takeover_run
It is set on every failure anyway.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 53722430ad35f80935aabd12fa07654126443b8b
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Sep 9 12:13:11 2013 +1000
recoverd: Fail takeover run if "ipreallocated" fails
Previously flagging a failure was probably avoided because of attempts
to run "ipreallocated" events on stopped and banned nodes, which would
fail because they are in recovery. Given the change to a new control
and that fallback only retries the old method on active nodes, this
should never fail in reasonable circumstances.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 9a3f0c0e61ca5c17e020c6e0463d73c7cf4f7c09
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Aug 27 12:14:34 2013 +1000
recoverd: New function do_takeover_run()
Factor the calling sequence for ctdb_takeover_run() into a new
function and call it instead. This changes rec->need_takeover_run to
false for each successful takeover run and that seems to be the right
thing to do.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f0f48f22f45e4c82eba2582efae307e25385de81
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Sep 17 12:00:26 2013 +1000
recoverd: Stabilise the recovery master role
On rare occasions when a node that has been inactive it will trigger
an election when it becomes active again. If that node has been up
for the longest then it will win the election and the recovery master
role will spuriously move.
While a node remains inactive we reset the priority time to discourage
it from winning elections. The priority time will now reflect roughly
how long the node has been active rather than how long it has been up.
That means the most stable node is more likely to win elections.
Having a stable recovery master means that disabling takeover runs
while reloading IPs is more likely to succeed. It also improves the
chances of being able to cache information in the recovery master -
for example, between takeover runs.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 403938804caf1322f9773d63197e4303a7b2a788
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Sep 4 13:54:23 2013 +1000
recoverd: Banned nodes should not be told to run "ipreallocated" event
They will reject it because they are in recovery. This can result in
extra banning credits being applied to banned nodes.
This corresponds to commit 9132e6814ed927fa317f333f03dedb18f75d0e5b
from the 1.2.40 branch.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c0bb147ca09e82019b05ec22995623cffc3184e2
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Sep 9 16:16:24 2013 +1000
common: Make parse_ip() valgrind-clean
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 36de63843de10a1f2a9ccdbbee24cc1d08542984
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Aug 27 15:27:30 2013 +1000
recoverd: Remove an orphaned comment
This should have been removed with the associated code in commit
14bd0b6961ef1294e9cba74ce875386b7dfbf446.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ea5576071b22e1877903ec0921d375626a23e13b
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Aug 27 15:24:17 2013 +1000
recoverd: Update a comment to use current terminology
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d8a76cf79f07dfb5a93c6c9a13f16e3268c7dd57
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Aug 27 15:16:51 2013 +1000
client: Remove unused function list_of_active_nodes_except_pnn()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d4e206fb818048b7fab4797c877b854bdbb1ab70
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Aug 27 15:14:10 2013 +1000
tools/ctdb: list_of_active_nodes_except_pnn() -> list_of_nodes()
list_of_active_nodes_except_pnn() is only used here and can be removed
if we remove this call. Less is more...
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 8753a094b97340deb26dd44f6ea345ca0a642a95
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Aug 28 15:36:27 2013 +1000
tools/ctdb: Fix a memory leak in parse_nodestring()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 4a388fc6bf54636b7e1f6da8e6aa451cddd574f7
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Sep 6 16:37:52 2013 +1000
tests/eventscripts: Tests for memory checking in 00.ctdb
... plus updates to test infrastructure to support.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 16fcff0d1993b7a0479341862ea44d10bd5c6d6d
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Sep 6 12:13:31 2013 +1000
eventscripts: Clean up monitoring of system memory in 00.ctdb
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 09940255011b119dc6af3304f5d3e9568e6006fd
Author: Michael Adam <obnox at samba.org>
Date: Thu Aug 22 16:17:09 2013 +0200
server: standardize formatting of comment block for ctdb_reply_dmaster() while I'm at it..
This was the comment block I was touching and meant to adapt in
commit 00d3bf092e2f72eda330978c75ec85f17e870553.
My search was apparently not unique...
Signed-off-by: Michael Adam <obnox at samba.org>
commit c446579fc442955ecc74f5566eaa0635c3171498
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Aug 21 14:01:25 2013 +1000
doc: Update NEWS
Signed-off-by: Martin Schwenke <martin at meltin.net>
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit eb8575718400c45626cd1b2e0fd247bc3ebff655
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Aug 22 17:59:31 2013 +1000
build: Fix build dependencies for ctdb_lock_tdb
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 618ea3660e36e7bd92b686e1ca8728cf63c3c068
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Aug 22 14:04:59 2013 +1000
tests/simple: Minimise the chance of a monitor event being cancelled
A monitor event following a "ctdb delip" might reconfigure services.
If the monitor event is cancelled then a service might be stopped but
not yet restarted and this could result in the subsequent monitor
events failing.
This obviously needs to be fixed in CTDB itself. This will happen by
making "ctdb reloadips" the supported way of reconfiguring IPs.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3ffca990a18cbd31c8bd3ae01c6671d60da58f58
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Aug 21 17:24:03 2013 +1000
packaging: Remove pushd/popd from maketarball.sh, don't need bash
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f0d69a9079b7aecc68f1d2d8510702046b618b19
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Aug 21 16:48:21 2013 +1000
tools/ctdb_diagnostics: Add output of "ctdb getdbmap"
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 406e1cb1fdd17ddd239774d0228e3657b73ae68f
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Aug 21 16:38:17 2013 +1000
tools/ctdb_diagnostics: Safer temporary file creation
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 81833052d7ee8f76b1e98376a0273448640cfa8e
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Aug 21 14:34:49 2013 +1000
eventscripts: Avoid using a temporary file in 62.cnfs
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 4b914d7e217202f3d11a8e95f9f74bc17869475b
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Aug 21 14:27:39 2013 +1000
scripts: Remove gdb_backtrace
This uses potentially insecure temporary files and is not referenced
anywhere else.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b1d8732b5da18ae80aea1df0e66b0b5cdcd919bc
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Aug 19 14:40:52 2013 +1000
tools/ctdb: Make most non-auto-all commands abort if run with -n all
Or if run with -n A,B,...
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 7b3f7eea2465efb099a2faf3e42174bc97b13a16
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Aug 15 05:02:37 2013 +1000
tools/ctdb: Remove more non-essential fetching of PNN from daemon
The useful cases are either CTDB_CURRENT_NODE, in which case
ctdb_get_pnn() does the job, or a PNN, which is... ummm... a PNN! :-)
This works because parse_nodestring() validates PNNs.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 123a4677528cb46bee1c6dad8a5162eba9880bc1
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Aug 19 13:54:49 2013 +1000
tools/ctdb: Improve auto-all settings for some commands
* ipreallocate is cluster-wide so should not be auto-all
* enablescript, disablescript, getreclock, setreclock, natgwlist can
all be auto-all without issues
* xpnn, ipiface a local-only so don't work with -n, so might as well
not be auto-all
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit da22d5e60dc023009854025cc9e6bc4b0a84c60e
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 16 20:27:25 2013 +1000
recoverd: Remove an unused temporary talloc context
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit db57261d7dc264e161659a8c547f44fbd9e88eeb
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 16 14:10:57 2013 +1000
recoverd: Move struct ctdb_public_ip_list back into ctdb_takeover.c
This is an internal structure. It was moved into ctdb_private.h a
long time ago to allow unit testing. Unit test compilation was
changed shortly afterwards to make this unnecessary.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3ef93a1a3e60cdf5d8954e7a16a988ea6126916b
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Aug 15 17:04:01 2013 +1000
recoverd: Log more information when interfaces change
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 256b157232c60bc432c94e54b1fae9699f737557
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 11 16:00:30 2013 +1000
traverse: Log when database traverse is started
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 4ed2efb838d2ac97746666f614ebef5fdf3cdd5e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Aug 22 15:12:17 2013 +1000
ctdbd: Finish eventscript callback processing before debugging hung script
This ensures that the result of eventscripts is updated and callback is
processed before debugging hung script. So "ctdb scriptstatus" output
will be useful from debug hung script.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-Programmed-With: Martin Schwenke <martin at meltin.net>
commit 7677fb263f06a97398e2c546e32273fb96edca69
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jul 23 16:00:15 2013 +1000
ctdbd: Make sure call data is freed if doing an early return
This should avoid memory bloat when a request bounces between nodes.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 92939c1178d04116d842708bc2d6a9c2950e36cc
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Aug 21 14:42:06 2013 +1000
common/io: Limit the queue buffer size for fair scheduling via tevent
If we process all the data available in a socket buffer, CTDB can stay busy
processing lots of packets via immediate event mechanism in tevent. After
processing an immediate event, tevent returns without epoll_wait. So as long
as there are immediate events, tevent will never poll other FDs. CTDB will
report this as "Event handling took xx seconds" warning. This is misleading
since CTDB is very busy processing packets, but never gets to the point of
polling FDs.
The improvement in socket handling made it worse when handling traverse
control. There were lots of packets filled in the socket buffer quickly and
CTDB stayed busy processing those packets and not polling other FDs and timer
events. This can lead to controls timing out and in worse case other nodes
marking busy node as disconnected.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit d8b094e804efc53fae9f44c6ef961b7b5797d290
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Aug 20 14:20:09 2013 +1000
Revert "common/io: Keep queue buffer size multiple of 4K"
This reverts commit 5e9b1a7e24d058ff88aaa0563db36a804e866fa9.
This is not the best approach. Allowing queue buffer size to grow
indefinitely causes large number of CTDB packets to be queued up very
quickly which when processed via immediate events will block CTDB from
processing events from other FDs. If there are immediate events queued
up, tevent will never process any of the FDs till all immediate events
are processed.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit ac417b0003f0116f116834ad2ac51482d25cfa0d
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Aug 19 15:04:46 2013 +1000
Revert "LACOUNT: Add back lacount mechanism to defer migrating a fetched/read copy until after default of 20 consecutive requests from the same node"
This reverts commit 035c0d981bde8c0eee8b3f24ba8e2dc817e5b504.
This is a premature optimization. Record can bounce between nodes
very quickly if it is a contended record. There is no need to hold a
record on a node unnecessarily. In case record contention becomes bad,
enabling sticky records on a database is a better idea.
Conflicts:
include/ctdb_private.h
server/ctdb_tunables.c
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 48f40985f4592c28402303ccbb458756f4914f75
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 15 15:39:47 2013 +1000
ctdbd: Print a log message when a key becomes hot
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit df83ae7a047dab4803e0d94b1c11df48ae17ca96
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 9 17:22:55 2013 +1000
ctdbd: For volatile databases, write an empty record with rsn=0 only on dmaster
Empty record with rsn=0 should not be written on any other node other than
dmaster. This is however not true for persistent databases. So currently
apply the check only for volatile databases.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 5cdad2b8ebd71a5e458c301d00eac00a211feeb3
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 9 17:00:10 2013 +1000
tools/ctdb: Fix message in showban when node is banned
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0fe79662e20e347d9e1cb12a42cd356e33572402
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 9 16:58:42 2013 +1000
tools/ctdb: Reimplement ban/unban using update_flags_wait_and_ipreallocate()
This has the side effect of making these commands more resilient to
control timeouts.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 444521c852749558f39dc6131acce9e47eefd489
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 9 16:34:59 2013 +1000
tools/ctdb: Factor out common pattern used in disable/enable/stop/continue
Now we will only have one set of bugs. :-)
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 4bf0b1c9d21986eecb7682f935bd6154c65533cc
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 9 15:41:37 2013 +1000
tools/ctdb: Factor, simplify and improve robustness of ipreallocate code
Having other functions call control_ipreallocate() suggests that the
it might look at the argv/argv arguments that are passed. This is not
the case. Change the callers so they call the new ipreallocate()
function instead.
Broadcast CTDB_SRVID_TAKEOVER_RUN to all connected nodes. Inactive
nodes will ignore it. This is safe since we only want 1 reply. If we
didn't get a response, we don't actually care if there's no active
recovery master - just fire, wait, retry, ...
Ignore some failures on the basis that they might be transient, so it
is probably worth retrying.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d8eb2e7fdd7645719370dad4f2faa5c3fffa8249
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Aug 15 04:38:02 2013 +1000
tools/ctdb: Use ctdb_get_pnn() to get PNN of the current node
This has already been stored at connect time and can't fail.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f9556a6f1fe0046308c8b363e6dcaf3f7ce6f2b7
Author: Michael Adam <obnox at samba.org>
Date: Mon Aug 19 16:54:06 2013 +0200
util: In passing the code, fix a space vs. tab in set_close_on_exec().
Signed-off-by: Michael Adam <obnox at samba.org>
commit 00d3bf092e2f72eda330978c75ec85f17e870553
Author: Michael Adam <obnox at samba.org>
Date: Mon Aug 19 17:07:19 2013 +0200
server: standardize formatting of comment block for ctdb_reply_dmaster() while I'm at it..
Signed-off-by: Michael Adam <obnox at samba.org>
commit cb3a1c5af3b796dba30cae07118670d3c9e57df7
Author: Michael Adam <obnox at samba.org>
Date: Tue Aug 13 10:17:45 2013 +0200
server: fix wording and punctuation in comment block for ctdb_reply_dmaster().
Signed-off-by: Michael Adam <obnox at samba.org>
commit 7b7aa7b599536cd60ebb84d363607bb4e953248a
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Aug 14 11:44:12 2013 +1000
recoverd: Improve log message when nodes disagree on recmaster
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 1c9025fdd08d1cea342af7487d0123015e08831b
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 2 11:05:08 2013 +1000
common: Null terminate process name string so valgrind doesn't complain
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit f0853013655ac3bedf1b793de128fb679c6db6c6
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Aug 12 15:50:30 2013 +1000
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 2)
This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster. This makes a request for
that record bounce between nodes endlessly.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit a610bc351f0754c84c78c27d02f9a695e60c5b0f
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Aug 12 15:51:00 2013 +1000
vacuuming: Fix vacuuming bug where requests keep bouncing between nodes (part 1)
This is caused by corruption of a record header such that the records
on two nodes point to each other as dmaster. This makes a request for
that record bounce between nodes endlessly.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 60cb40d090e45ff6134c098a238fac7ad854f134
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Aug 6 14:37:13 2013 +1000
db_wrap: Make sure tdb messages are logged correctly
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit e9ef93f7b6dad59eabaa32124df81f3e74c651ef
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Aug 12 11:36:25 2013 +1000
eventscripts: Become unhealthy faster on nfsd failure
Anecdotal evidence suggests that most nfsd RPC check failures are due
to cluster filesystem or storage problem. Apparently these are rarely
helped by attempting to restart the NFS service because the restart
tends to hang.
Fail after 2 nfsd RPC check failures, instead of waiting for 6
failures. Restart on every 10th failure to try to bring the node back
to good health.
Update unit tests to match.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b49c4f39666d5b1596213bf41bcdc47ed3c327ae
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 9 11:56:29 2013 +1000
tools/ctdb: Increase default control timeout to 10 seconds
The current 3 second timeout is arbitrary and users trip over it
sometimes.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ff5f0d1e29af2b293e30cdc54bed03a644be7038
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Aug 8 16:02:44 2013 +1000
eventscripts: Improve message logged when a counter hits a limit
It should print the actual number of consecutive failures rather than
the limit.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 11fbf4789d783dd0bac22754b374dd9ea4b03bad
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Aug 6 12:42:13 2013 +1000
eventscripts: Print a message when waiting for TCP connections to be killed
This makes the gaps in the logs more obvious.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 1d61988af9e4fa3621a3e2d06a859bcb53df2d67
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Aug 5 15:12:14 2013 +1000
eventscripts: New configuration variable $CTDB_RPCINFO_LOCALHOST
Passing "localhost" to the rpcinfo command causes overheads, like
reading /etc/services multiple times.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit f4ef83a256f59eeb00b9a5bc10c28347e1ad1031
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 2 15:18:47 2013 +1000
eventscripts: Add modulo (%) operator to ctdb_check_counter()
Also add it to the corresponding eventscript unit test infrastructure.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e8b531405665885196c95fe1608db33a255bf761
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 2 16:05:46 2013 +1000
eventscripts: Separate out RPC service restart code
While doing this:
* Explicitly assign RPC program and version information in
_nfs_check_rpc_common(). This is more lines of code but is easier
to read.
* Don't print the options when starting a service. Trying to print it
makes the code messy for little benefit.
Update the eventscript unit testing code and a Ganesha test to
reflect this.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3ba933d806106d12bc48b83b22d0f314d9d1e5e5
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 2 16:03:42 2013 +1000
tests/eventscripts: Override background_with_logging(), just prepend "&"
That is, output that goes through background_with_logging() just gets
"&" prepended to each line. This is cleaner than having the tests
grovel through logs.
Update some 49.winbind/50.samba tests to deal with this.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 1a1be43f8466d46913dcdfe6dcedb94316cd28ad
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 30 16:24:24 2013 +1000
eventscripts: Remove support for RPC service 'q' and 's' restart flags
They're hard to maintain and provide very little benefit.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c7332526b1b488abefeb4be78a7cd3f2f9abc451
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 30 16:21:36 2013 +1000
eventscripts: When restarting the nfslock service only show output of start
That is, /dev/null the "stop" output. This is consistent with the way
CTDB generally deals with the output when stopping a service.
It also makes updating the eventscript unit tests easier.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 63be516673c5d9c0d543617bf1bb8bca919956a8
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 29 15:27:24 2013 +1000
tests/simple: Unreachable node test should wait for recovery to complete
This should minimise the chances of a control timing out.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 4e3bd06916bd3adac213fb18c7c2a24854b02d45
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 29 15:09:23 2013 +1000
tests/simple: Fix the missing IP test
Update the missing IP test to wait until restarts are complete.
Otherwise a service restart can collide with the following monitor
event and cause chaos.
Also, do not disable 10.interface until it matters. Disabling it too
early can cause even more chaos if something goes wrong with the
monitor step.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 2fc6b6403707a292d134140fc0b9145b454992c5
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Aug 13 14:02:46 2013 +1000
recoverd: Use TDB_INCOMPATIBLE_HASH when creating volatile databases
When creating missing databases either locally or remotely, recovery
master calls ctdb_ctrl_createdb(). Recovery master always passes 0
for tdb_flags. For volatile databases, if TDB_INCOMPATIBLE_HASH is not
specified, then they will be attached without using jenkins hash causing
database corruption.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit ca61eb776ab862bd269e45ee0f9f96e7e1e0e001
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Aug 13 13:55:47 2013 +1000
Revert "recoverd: Use correct tdb flags when creating missing databases"
This reverts commit 10a057d8e15c8c18e540598a940d3548c731b0b4.
This approach would not work when creating local databases since currently
there is no control to receive TDB flags for remote databases.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 5e9b1a7e24d058ff88aaa0563db36a804e866fa9
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Aug 5 17:28:47 2013 +1000
common/io: Keep queue buffer size multiple of 4K
Currently queue buffer size is realloc'd every time we need to extend the
buffer. Small increments can cause memory fragmentation. Instead always
extend buffer in multiples of 4K. This should reduce multiple talloc_realloc
calls when there are lots of packets in the socket buffer.
Also, if queue buffer has grown larger than 64K, throw away the buffer once
all the requests in the queue have been processed. That way queue does not
hold on to large buffers.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 867afb247bd8cc86c8d738f051a44cc534cafacf
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jul 26 13:57:03 2013 +1000
packaging: Allow setting custom release number in RPM spec file
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-Programmed-With: Amitay Isaacs <amitay at gmail.com>
commit 44a64d1c388bfe3c3388b191edfaedecfb7bb831
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jul 31 15:59:11 2013 +1000
ctdbd: When a record is made sticky, log only once
Instead of logging from ctdb_request_call(), log the message from
ctdb_make_record_sticky(). That way if the record is already sticky, the
message is not repeated unnecessarily.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 9cde47e1a5bf1b9ca3b4da8c2db94caac2b1aa5e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 15 17:34:31 2013 +1000
ctdbd: Improve high hopcount log messages when request is redirected
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 81d7ce03b28d592a1337639e14d9ea141e20bfff
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Aug 6 16:11:40 2013 +1000
scripts: Do not run ctdb tool commands when debugging hung "init" event
CTDB daemon is not ready to accept clients in INIT runstate (init event).
CTDB daemon will start accepting connections in SETUP runstate (setup event)
and later.
Also, minor log formatting changes.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d7f6bc3fed2dc61e6e587b4c0ec0ac27d533bbbe
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Aug 5 17:38:42 2013 +1000
ctdbd: Avoid leaking file descriptor if talloc fails
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 9e99e0eb072e2b845914ee3896acbc66b96138d7
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Aug 5 14:08:28 2013 +1000
eventscript: Wait for debug hung script to finish or timeout before continuing
Currently if the debug hung script takes long time to finish, the subsequent
monitor event can collide with the previous event which is not yet finished.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 44eb86e6042adb6efe75d2a5528b82a0f21d496d
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 2 15:49:06 2013 +1000
eventscripts: Use configured RECLOCK file instead of asking CTDB
On cluster where recovery lock file is not being used, asking CTDB daemon
is unnecessary overhead. And if CTDB is using recovery file, then changing
configuration without restarting is *stupid*.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-Programmed-With: Martin Schwenke <martin at meltin.net>
commit ebecc3a18f1cb397a78b56eaf8f752dd5495bcc9
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 2 10:54:38 2013 +1000
locking: Do not create multiple lock processes for the same key
If there are multiple lock helper processes waiting for the same record, then
it will cause a thundering herd when that record has been unlocked. So avoid
scheduling lock contexts for the same record. This will also mean that
multiple requests will get queued up behind the same lock context and can be
processed quickly once the lock has been obtained.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 68af5405acc123b5a90decd2123e2a02961a8fcf
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 2 10:51:45 2013 +1000
locking: Move function find_lock_context() before ctdb_lock_schedule()
So that ctdb_lock_schedule() can call this function without requiring extra
prototype declaration.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 824dcec35ec461d78e22b2ea109473b32bfe3972
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jul 30 14:17:55 2013 +1000
ctdbd: Print set db sticky message after it's set
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit f6b066a23610fb0092298861c21a9b354b91e2f1
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Dec 4 18:27:10 2012 +1100
tests: Add a test program to hold a lock on a database
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 10a057d8e15c8c18e540598a940d3548c731b0b4
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jul 30 12:45:01 2013 +1000
recoverd: Use correct tdb flags when creating missing databases
When creating missing databases either locally or remotely, make sure
to use the correct tdb flags from other nodes. Without this, volatile
databases can get attached without TDB_INCOMPATIBLE_HASH flag.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 7e7e59c4047c78159387089eca65d90037bcf722
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Aug 1 11:07:59 2013 +1000
client: Always use jenkins hash when attaching volatile databases
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 32c83e209823e9a4d6306bb7fd63d4500f3e2668
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 29 13:50:44 2013 +1000
recoverd: Make sure to use jenkins hash for recovery databases
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit fcf77dec5af973a0e32f3999bc012053a6f47a96
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 22 17:26:28 2013 +1000
recoverd: Assemble up-to-date node flags information from remote nodes
Currently nodemap used by recovery master is the one obtained from the local
node. This information may have been updated while processing main loop.
Before comparing node flags on all the nodes, create up-to-date node flags
information based on the information received from all the nodes.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 049d9beb3783482490e6273a434ccbad23f85f0a
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 15 16:35:30 2013 +1000
tools/ctdb: Only print the hot records with non-zero hopcount
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit ab35773518ad15588013f4d859f7bee790437450
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 15 16:32:40 2013 +1000
ctdbd: Don't consider a hot record if the hopcount is zero
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit fde4b4db5a57f75c5efa5647c309f33e0d5a68f3
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Jul 12 17:33:13 2013 +1000
ctdbd: Fix updating of hot keys in database statistics
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit e73b2e12adc9db1dedb48d32bba3a8406a80f4cd
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 15 15:24:11 2013 +1000
ctdbd: Remove incomplete ctdb_db_statistics_wire structure
Instead of maintaining another structure, add an element as place holder for
marshall buffer of hot keys. This avoids duplication of the structure.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 023ca2e84f5ed064a288526b9c2bc7e06674dd81
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 15 14:52:07 2013 +1000
Revert "ctdbd: Remove incomplete ctdb_db_statistics_wire structure"
The structure cannot be removed without adding support for marshalling keys
for hot records.
This reverts commit 26a4653df594d351ca0dc1bd5f5b2f5b0eb0a9a5.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 57aa2dffea60abd73a95233f8b761cc676adebb6
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jul 26 15:09:24 2013 +1000
doc: Update XML files to use standard DocBook DTD
This simplifies building since we don't use any of the Samba
extensions.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 37ccc7c6cc43a80aaa92291aea7a438f4225488a
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jul 26 11:20:47 2013 +1000
initscript: The wrapper script should export CTDB_SOCKET
This ensures that any invocation of the ctdb tool (within the wrapper)
gets the desired value. This at least ensures that ctdbd will be
started.
If a non-standard value is set for CTDB_SOCKET then command-line users
will still need the variable in their environment.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 782814288bb560099ee44b607bf35f3eddf37f82
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jul 25 16:17:07 2013 +1000
ctdbd: Kill client process without checking for tracked child
Commit f73a4b1495830bcdd094a93732a89dd53b3c2f78 added a safety check
to ensure that CTDB never kills unrelated processes. However, client
processes are unrelated.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a20d94717d2e4ab866d8a002cdf39c0669b74c6a
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jul 25 13:40:43 2013 +1000
eventscripts: kill_tcp_connections() should send connections to stdin
This avoids issuing multiple "ctdb killtcp" commands to terminate tcp
connections, one per connection. This will considerably reduce the
time when there is a large number of tcp connections. This also makes
it possible to avoid calling "ctdb killtcp" when there are no connections.
Add a couple of unit tests for killtcp and update eventscript unit
test infrastructure to support.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit af5aa369c266430fe912df0c26116b68bac3572e
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jul 25 13:28:26 2013 +1000
tools/ctdb: Allow killtcp to read connections from standard input
This will allows eventscripts to send information about multiple tcp
connections to a single "ctdb killtcp" command, saving the overhead of
setting up a client connection per tcp connection.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit a69e03a5e4671e998d45b4fef8611a421bbdb3e1
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 22 20:11:58 2013 +1000
tests: Always tally the number of passed/failed tests
Regardless of whether a summary is being printed!
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit bf4a7c1ad87e0e848296d15d63eb8cd901ca5335
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 22 16:39:46 2013 +1000
recoverd: Call takeover fail callback only once per node
Currently the fail callback is called once per (takeip/releaseip) control
failure. This is overkill and can get a node banned much too quickly.
Instead, keep track of control failures per node and only call fail
callback once per failed node.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 1b016b2dfc5d7d3f2a42ce4dfe569608e90eb714
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 22 15:08:32 2013 +1000
scripts: Run scriptstatus for hung event
The timeout information printed by ctdbd is less than useful because
it refers to the cumulative time taken by the eventscripts run so far.
Adding scriptstatus output indicates where time was actually spent.
Since there is now quite a bit of output, serialise the calls to this
script using flock.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit e0f3fa1020e13b84bdd672538168d148f1847d57
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 22 15:06:52 2013 +1000
ctdbd: Pass event name to hung script debugger
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 29e98017221326bdc9b1c4f7c05b3b495c1de29b
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 22 14:32:13 2013 +1000
tests/complex: Fix NFS tests to work with root_squash
Refactor the NFS test setup/cleanup code into new common functions.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 9d6e1c147bd036d832b98c155f405ee2a5d6f57f
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jul 19 19:59:43 2013 +1000
tests: Fix exit status of run_tests when a single test is run with -H
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ae3c03d80264e997b7da9f3279d7810e18b8a1df
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jul 19 15:33:38 2013 +1000
tests/simple: Add -p in onnode test to help show groups of connections
Change the command from "true" to "hostname" since the former won't
produce any output when used in combination with "onnode -p". This
could just be changed to "echo" but the hostname might actually be
useful.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 90d792cf28d6a823141e4c417b6978f02a9cf596
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 17 11:14:37 2013 +1000
ctdbd: Sleep at exit to allow time for log messages to flush
Register print_exit_message() earlier so that it covers most of the
early exits.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 3dd5b925dcf0e9a5b877638e471c5ecf36b46c58
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jul 19 15:36:29 2013 +1000
ctdbd: Exit if something is already listening on CTDB socket
Don't blindly remove the socket.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 53e4eca74429f76adc81d98e3d11d1bd61194d71
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 16 19:57:18 2013 +1000
tests/eventscripts: Add tests for monitoring of missing interfaces
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 501f19b16fd6d67fbb754248868c38ee5bcf79ef
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jul 12 12:48:34 2013 +1000
eventscripts: A missing interface should cause monitoring to fail
A missing interface is at least as bad as an interface with a link
that is down so should have a similar effect.
This couldn't be done previously because orphaned interfaces used to
be listed for monitoring. This was worked around in 10.interface in
commit 49b2d1bd9554461ed8edbfc21e777c0eca9e1443 and fixed in ctdbd in
commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.
If $CTDB_PARTIALLY_ONLINE_INTERFACES="yes" then monitoring won't
actually fail but the interface is still marked as down.
While we're touching this code, use "ip link" instead of "ip addr".
It is marginally cheaper but not enough for a separate patch. ;-)
This effectively reverts d67955b42f7627be9dae995230c8fcbb8a948ec2.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c6ab0f9405d5fa5b0b1693bc92e59da0d555a9d7
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jul 12 12:33:36 2013 +1000
eventscripts: Get list of configured interfaces using "ctdb ifaces"
This was previosuly changed because ctdbd didn't garbage collect
orphaned interfaces. This was fixed in commit
cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 57ef5d3827ea3417a32703e259a53ce6fd10ac45
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jun 24 15:49:48 2013 +1000
ctdbd: Allow extra recovery to repair persistent DBs during first recovery
Commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28 introduced a potential
regression because a node may not have completed the "recovered" event
(so might still be in CTDB_RUNSTATE_FIRST_RECOVERY) when another node
becomes healthy.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5740155cc5de1a223412e8529aa1a383a5412514
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jul 16 12:53:16 2013 +1000
packaging: Bundle debug_locks.sh script in RPM
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 67c227a5d30cb8487b20b19b20bdfa4613906609
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jul 16 12:52:00 2013 +1000
packaging: No need to check for existence of scripts, they always do
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 412bc0e20bef694d4e911dc9c984fd7716231f1f
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jul 11 14:26:38 2013 +1000
scripts: ctdbd_wrapper logs a message to syslog if syslog is not being used
It can be very disconcerting when logging to syslog is expected but
nothing is being logged there.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a4afe7af9c9391048d6f80135bbd5e15367770c7
Author: Mathieu Parent <math.parent at gmail.com>
Date: Fri Jun 7 19:01:06 2013 +0200
Update Nagios check to work with ctdb versions past 30 Aug 2011
Because of commit a779d83a6213e2ba
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 40f2825d6e818dc8c745b6385a545969dfb45fbc
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jul 11 13:01:13 2013 +1000
recoverd: Really fix bogus info in message about changed flags
Commit 9119a568c2b4601318f7751f537dca2f92a7230b attempted to fix this.
However, this was wrong because old_flags and new_flags were confused.
The latter has since been fixed in commit
7eb2f89979360b6cc98ca9b17c48310277fa89fc so this can now be fixed
properly.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 76703514040b804b880cab909f6ff52576f80f89
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 10 14:44:56 2013 +1000
doc: Update NEWS
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0930a3b806977555509c3228726e2250aef1f971
Author: Sumit Bose <sbose at redhat.com>
Date: Mon Nov 19 18:45:37 2012 +0100
Print deleted nodes as well
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit a81edf7eb908659a379f0cb55fd5d04551dc2c37
Author: Sumit Bose <sbose at redhat.com>
Date: Thu Sep 1 15:18:46 2011 +0200
IPv6 neighbor solicit cleanup
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit da87395d29f5d11ecfedaf36b53fa060a9140bfd
Author: Sumit Bose <sbose at redhat.com>
Date: Mon Nov 19 11:13:03 2012 +0100
Fix memory leak in ctdb_send_message()
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 05bfdbbd0d4abdfbcf28e3930086723508b35952
Author: Sumit Bose <sbose at redhat.com>
Date: Wed Aug 10 17:53:56 2011 +0200
Fixes for various issues found by Coverity
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 5cdcc3d45d358ddbcd7e864898eed9cbd9935429
Author: Sumit Bose <sbose at redhat.com>
Date: Mon Nov 19 11:20:31 2012 +0100
Check return value of tdb_delete()
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit ed9ba1d3dcfcb51aa69bf4d7a74b95063743d8d9
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 11 13:46:18 2013 +1000
web: Update webpages
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 9ffcd6a91287d86bae7b0c73aa129c81126e08e7
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 11 11:34:46 2013 +1000
Tests: Correct the arguments to memset
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 14141b02b61d2783b750ee5b30f9520253e88f09
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jul 10 14:44:56 2013 +1000
doc: Update NEWS
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-programmed-with: Martin Schwenke <martin at meltin.net>
commit e43a4b7b69a21c4cec2453dcac436b64bf5d7f06
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 10 17:19:55 2013 +1000
packaging: Add systemd support
Based on an original patch by Sumit Bose <sbose at redhat.com>.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 30a0040fbb7c4d97d107f0e55c600295c2603a68
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 10 16:35:53 2013 +1000
build: Turn off all deprecation warnings
The "âtevent_loop_allow_nestingâ is deprecated" warnings will be
around for a while and are annoying.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b6bbfb4c464c39e322830cbbebcc51c225508584
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 10 16:30:29 2013 +1000
build: Remove -DTEVENT_DEPRECATED_QUIET=1 from CFLAGS
This reverts the last part of 788cdbddbc902a5b076d23473450065b551d274d
- the rest of this has been implicitly reverted via tevent syncs.
This is just leftover noise.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e3abc7eebab5cceddc4ce7817890dd5db9be3450
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 9 15:22:07 2013 +1000
initscript: Simpify initscript and control CTDB via new ctdbd_wrapper
Currently the initscript is very complex. This makes it hard to read
and hard to add support for new init systems, such as systemd.
Create a wrapper called ctdbd_wrapper to be installed alongside ctdbd.
This is called by the initscript to start and stop ctdbd. It does the
ctdbd option construct and waits until ctdbd is properly initialised
before it exits.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit c6fded59fa4da67f738a90fdacb51900e41801f9
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 8 12:45:31 2013 +1000
recoverd: Recovery daemon should use ctdb_get_pnn, which can't fail
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 846109169ee5e3d03135156e45c8dac93aa2e95b
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jul 10 12:23:30 2013 +1000
ctdbd: Print tdb flags when logging attached to database message
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 2493f57ce268d6fe7e4c40a87852c347fd60d29e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jul 9 12:32:53 2013 +1000
ctdbd: Set process names for child processes
This helps distinguish processes in process list in top, perf, etc.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit fc3689c977f48d7988eed0654fb8e5ce4b8bfc8b
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jul 9 12:24:59 2013 +1000
common/system: Add ctdb_set_process_name() function
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit dc834d5e78c3fb97ae15cddf1139b3c4a4051a7c
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jun 6 16:29:04 2013 +1000
traverse: Remove unused start_time field
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 1a74192aa7d51ed99553e7292860027f06b6ef37
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jun 6 16:26:25 2013 +1000
traverse: Send records directly from traverse child to srcnode
Currently CTDB daemon reads records from a child process and then sends them to
srcnode via TRAVERSE_DATA control. This ties up main CTDB daemon and also
requires an extra copy of the record in the CTDB daemon. Instead send records
directly from traverse child process.
The control from child process still goes via local CTDB daemon as there
is no infrastructure currently to open a TCP socket to the srcnode.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit faabce1b99fb3de9ff03bf54d303e7656538fee3
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jun 6 16:12:07 2013 +1000
traverse: Pass reqid and srcnode information to local database traverse
So that traverse child process can directly send the TRAVERSE_DATA control to
the srcnode without first sending it to local node.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 8225b3e77e140db34b52571a95d553d1e59e3f1e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 8 16:14:59 2013 +1000
packaging: When building with system libraries, add dependency for them
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 2211cd94bea266547d3e6f167d3160a6b23bec88
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 8 15:49:58 2013 +1000
ctdbd: No need for DeadlockTimeout tunable
The code for deadlock detection and killing smbd process causing deadlock
has been removed and replaced with external debug script.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit a415a1986900135f889efc25ecaf2761b1dae81a
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 8 15:57:22 2013 +1000
initscript: Export CTDB_DEBUG_LOCKS variable
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit c711ff4702c5f95b75e4bf030665fc2afffc2f9e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 8 15:56:30 2013 +1000
scripts: Add an example debug_locks.sh script to debug locking issue
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 2bfb8499366d530f16515b08928056bbda40f781
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 8 15:46:53 2013 +1000
locking: Use external script to debug locking issues
Use an external script to parse /proc/locks and log useful debugging
information about locks rather than doing that in C code.
To use this feature, add configuration variable to /etc/sysconfig/ctdb:
CTDB_DEBUG_LOCKS=/etc/ctdb/debug_locks.sh
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 6fc36a7036933237d09151a0baf4d8ccd2bc2c99
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jul 3 11:01:21 2013 +1000
locking: Update locking bucket intervals
0 < 1 ms
1 < 10 ms
2 < 100 ms
3 < 1 s
4 < 2 s
5 < 4 s
6 < 8 s
7 < 16 s
8 < 32 s
9 < 64 s
10 >= 64 s
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit dcc42a75b4638b3aa40c44ed9e0aaae26483e2b0
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jul 3 11:46:53 2013 +1000
locking: Update locks latency in CTDB statistics only for RECORD or DB locks
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 594c421f90ce132c75fbd985872114e4967f92b5
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jun 25 15:36:13 2013 +1000
tools/ctdb: Fix the format of DB statistics output
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 26a4653df594d351ca0dc1bd5f5b2f5b0eb0a9a5
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jun 25 15:25:16 2013 +1000
ctdbd: Remove incomplete ctdb_db_statistics_wire structure
Send the ctdb_db_statistics directly instead of first copying it to
duplicate ctdb_db_statistics_wire structure. This simplifies the
implementation of the control to get database statistics.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 545a46437dfb2b755bb2fddb11dea8c4ccce3ed7
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 4 09:04:49 2013 +1000
ctdbd: Update debug messages for setting readonly property on database
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 41182623891d74a7e9e9c453183411a161201e67
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Jul 5 14:04:20 2013 +1000
recoverd: Fix buffer overflow error in reloadips
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-Programmed-With: Martin Schwenke <martin at meltin.net>
commit e1cf1f728236d808bb41265e74bc65f54bf1c133
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jul 4 20:02:29 2013 +1000
tests/eventscripts: Add some rudimentary tests for 60.ganesha
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f606df4f2db754592e6d1a16c26e155cacb2beef
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jul 4 16:05:01 2013 +1000
eventscripts: New configuration variable $CTDB_SKIP_GANESHA_NFSD_CHECK
This allows 60.ganesha to be unit tested, except for the core Ganesha
monitoring code.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ceb5b2d37f7ab4894908ec26f3812b3bed991525
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jul 4 16:00:33 2013 +1000
eventscript: Move Ganesha nfsd monitoring to a function
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 520914e7ee1b879c1080e5857fda18ed5b973fd6
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jul 4 15:11:54 2013 +1000
eventscripts: Drop RPC service version from nfs_check_rpc_service() calls
Support for this was removed in commit
77302dbfd85754e02559eccb2dd6c090db0b6b9f and I overlooked its use in
60.ganesha.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 4d0f26b306fc465d551d340b0e7dce4412eae3fd
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 2 14:43:17 2013 +1000
ctdbd: Log something when releasing all IPs
At the moment this is silent and it can be confusing to see IPs just
disappear.
Also, this message:
Been in recovery mode for too long. Dropping all IPS
can cause anxiety when all IPs should already have been dropped.
Adding a comforting message saying that 0 IPs were dropped relieves
such anxiety. :-)
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0a292fa8939a1343e44cadaa8ed9f3c0f18ca82f
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 30 19:00:36 2013 +1000
recoverd: Minor style improvements for ctdb_reload_remote_public_ips()
* Add a variable to the loop to make the code more readable and have
it generally fit into 80 columns.
* Improve comments.
* Improve log messages.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f0942fa01cd422133fc9398f56b4855397d7bc86
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 30 18:45:46 2013 +1000
recoverd: Clean up log messages in remote IP verification
The log messages in verify_remote_ip_allocation() are confusing
because they don't include the PNN of the problem node, because it is
not known in this function.
Add the PNN of the node being verified as a function argument and then
shuffle the log messages around to make them clearer.
Also fold 3 nested if statements into just one.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 298c4d2c3b4ea3d900c91f5a0a5aca2952a13d61
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 30 17:57:33 2013 +1000
recoverd: Fix an unclear log message - "Restart recovery process"
When the recovery master notices a node in recovery mode it starts the
recovery process, it doesn't restart it.
Update documentation to match.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 9f6cd8b0bea619991c9f3bf35188c5950dabf8f4
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 30 17:53:37 2013 +1000
recoverd: Fix an incorrect comment
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 035bf3eecf99337c84d4ad16cdbf297b1fa037db
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 30 17:48:01 2013 +1000
ctdbd: Use ctdb_die() on "setup" event failure
This is slightly easier to read because it all fits on 1 line.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3af2d833b63af9931792106db71797f3692669a8
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 30 17:43:52 2013 +1000
ctdbd: Avoid a core dump when "init" event fails
The "init" event only really fails in the scripts, which should log
something useful on failure. Therefore, a core dump isn't terribly
useful and sometimes attracts unwanted attention.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c0a9456692c88a7a5542cd893d8f326524d3f94e
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 30 17:42:11 2013 +1000
util: New function ctdb_die()
This is like ctdb_fatal() but exits cleanly without dumping core or
generating a backtrace.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ce04f1c107b4392ca955d9f29b93aaaae62439ce
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jun 24 19:03:26 2013 +1000
eventscripts: When replaying monitor status, don't log empty output
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c5797f2942e83da24df548ea07196fbbac0eab20
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jun 24 16:05:03 2013 +1000
ctdbd: Release IP callback should fail if the IP is still hosted
At the moment there (at least) are 2 bugs that cause rogue IPs:
* A race where release_ip_callback() runs after a "subsequent" take IP
has completed. The IP is back on an interface but we unset
vnn->iface in the callback.
* A "releaseip" eventscript times out. We ignore the timeout and call
it success, deleting the VNN even if the IP is still hosted.
We could decide not to ignore the timeout and ban the node, but
killing TCP connections can take a long time and that might result
in a lot of manning. We probably won't reinstate banning on
"releaseip" until killing TCP connections has been optimised.
In both cases, a rogue IP can be avoided by leaving vnn->iface set and
simply failing the control.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit f1f1b0c24b9b6cd24b83a4e4da16e179287ec6ac
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jun 24 15:49:48 2013 +1000
ctdbd: Log warnings in release IP when unexpected interface is encountered
Previous code changes work around a potential problems but do not
provide useful information when the a problem occurs.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 16afe36de52561a62372c14b567683dc898369d5
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 4 17:37:05 2013 +1000
ping_pong: Validate num_locks argument > 0
This fixes the floating point error if num_locks = 0.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit d48eecd748830598f4f080952f2bf05d6f92738c
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 4 17:27:00 2013 +1000
tests: If connection to ctdb daemon fails, exit
This fixes the segmentation error if any of the test code fails to
connect to CTDB daemon.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 5408c5c4050539e5aa06a5e82ceb63a6cb5cef0c
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 4 17:00:23 2013 +1000
build: Fix compiler warnings for uninitialized variables
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 9aa13bcedd83d463c871e3cf1f3a65da3cd83992
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 4 15:36:29 2013 +1000
recoverd: Send the result from child process only once
The result has been sent before the child keeps waiting for parent
ctdbd process.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 9b529189f8456fad7868fc154ae27a6fd87e93b3
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 4 15:31:52 2013 +1000
packaging: Enable compiler optimizations
This reverts d09570c70551aa40390ce9ceffe7bc234e1afafe.
... hoping the segv has been found in last 6 years. :-)
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit bb54f3924ff19cd089b0a166fe8368db162ad709
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 4 15:14:10 2013 +1000
packaging: Allow building RPMs with system tdb/talloc/tevent
To build CTDB RPMs with system installed libraries, use following command:
./packaging/RPM/makerpms.sh \
--with system_talloc \
--with system_tdb \
--with system_tevent
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 1b0faae9c939a2f8da3cacba715ca62a5830d190
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 4 14:29:09 2013 +1000
packaging: Do not mark /etc/ctdb/functions as configuration file
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 53d34eb2f9e5434dea4e7182b6af566a3a96a368
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 4 13:19:56 2013 +1000
packaging: Install README.notify.d using %doc directive
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-Programmed-With: Martin Schwenke <martin at meltin.net>
commit 6fe584d05543eebd24abd19bab502dc4da04e921
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 4 12:45:32 2013 +1000
packaging: Install docs using %doc directive
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-Programmed-With: Martin Schwenke <martin at meltin.net>
commit 7e53fbf92b6dd5211d918ea0e23126b7dfa50c42
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 4 11:33:38 2013 +1000
packaging: Remove ctdb_transaction from docdir
It's bundled in ctdb-tests package.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 145b1966c1b34f1667a175235e1df2741294391c
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 30 17:23:08 2013 +1000
doc: Add a disclaimer for the EnableBans tunable
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b4c06e8ec8b227c1e6c01444038c3b15b5f9e606
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 30 17:22:06 2013 +1000
doc: Add banning bug fixes to NEWS
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ef1c4e99ca66e7a990bc557f34abb624c315e6ba
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jul 2 12:40:37 2013 +1000
ctdbd: Don't ban self if init or shutdown event fails
There is no point in banning the node if init or shutdown event times
out since it's going to quit anyway.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit fcd5e1f04c5fe6c98399429b8f0918b8779acba6
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jun 27 17:46:43 2013 +1000
doc: The second half of monitoring is only for recovery master
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 932360992b08a5483d90c0590218ba0fd756119e
Author: Michael Adam <obnox at samba.org>
Date: Wed Jun 26 09:23:22 2013 +0200
recoverd: when the recmaster is banned, use that information when forcing an election
When we trigger an election because the recmaster considers itself inactive,
update our local nodemap with the recmaster's flags before calling
force_election(). This way, we don't send the inactive node freeze commands
(e.g.) that may fail and then lead to ourselves getting banned.
The theory is that this should help avoiding banning loops.
Signed-off-by: Michael Adam <obnox at samba.org>
commit 741944f118e98f178b860194eecb215180949d18
Author: Michael Adam <obnox at samba.org>
Date: Wed Jun 26 07:11:51 2013 +0200
recoverd: fix a comment typo
Signed-off-by: Michael Adam <obnox at samba.org>
commit ac06c46e4a80c635f6094b5ac6f0bf3e3a02db95
Author: Michael Adam <obnox at samba.org>
Date: Fri Jun 21 17:57:37 2013 +0200
recoverd: fix a comment in main_loop
Signed-off-by: Michael Adam <obnox at samba.org>
commit df30c0a05ed908fc2a997c56ff5484736b23b70f
Author: Michael Adam <obnox at samba.org>
Date: Fri Jun 21 14:06:22 2013 +0200
recoverd: eliminate some trailing spaces from ctdb_election_win()
Signed-off-by: Michael Adam <obnox at samba.org>
commit 14399de1dd0bd8dabf1f48b1457e3ccb37589d8a
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jun 28 16:31:07 2013 +1000
recoverd: Don't continue if the current node gets banned
Can not continue with recovery or monitoring cluster.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit aea12dce83ef385e9fb3bc03ac7ace0874a0e3fe
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Jun 28 14:31:02 2013 +1000
recoverd: Refactor code to ban misbehaving nodes
Since we have nodemap information, there is no need to hardcode the
limit of 20.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-Programmed-With: Martin Schwenke <martin at meltin.net>
commit ae1693905036ecdbc4594fde1f12500faae4a554
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jun 27 16:01:16 2013 +1000
recoverd: Move code to ban other nodes after we get local node flags
If a node gets banned first, then it should not ban other nodes.
This code was moved up in main_loop to avoid waiting for nodemap
from other nodes (commit 83b0261f2cb453195b86f547d360400103a8b795).
To prevent a banned node from banning other nodes, we need to first get
nodemap information from local node, so trying to ban other nodes can
fail if we are already banned.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 593a17678fbd3109e118154b034d43b852659518
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jun 27 15:44:27 2013 +1000
recoverd: Delay the initial election if node is started in stopped state
Since there is an early exit if a node is stopped or banned, we can wait till
the node becomes active to start initial election.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 93bcb6617e1024f810533e12390a572f51703ca0
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jun 27 15:33:49 2013 +1000
recoverd: Update capabilities only if the current node is active
Since we do an early return if a node is stopped or banned, move update
capabilities code below the early return and just before we check the
capabilities of current recovery master.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 815ddd3341b7e9db39e05a3a3fcd9a1420f053bc
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jun 27 15:46:04 2013 +1000
recoverd: No need to check if node is recovery master when inactive
If a node is stopped or banned, it will cause early return from the
main_loop, so this check is redundent. The election will called by an
active node.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 2396981c4bcf30530aeb7f4395093cc202105b50
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jun 27 15:39:15 2013 +1000
recoverd: Always do an early exit from main_loop if node is stopped or banned
A stopped or banned node cannot do anything useful. So do not participate
in any cluster activity and do not cause any unnecessary network traffic.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 38304f88e0c634e97d4687c25adef975f71537b8
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Jun 28 14:10:47 2013 +1000
recoverd: Do not set banning credits on a node if current node is inactive
If the current node is banned or stopped, then it should not assign banning
credits to other nodes since the current node will not have up-to-date flags
of other nodes.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit a60f228f8380f222f838eb619d2ab55f96f11ac2
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 1 17:40:36 2013 +1000
banning: Do not come out of ban if databases are not frozen
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 297d93cecc3c0655e72ecac38508e113bdbeab9c
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jun 24 14:33:32 2013 +1000
banning: No need to check if banned pnn is for local node
If the banned pnn is not the local node, the function returns early.
So no need for additional check.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit bb178338658b4ae32382a1f62f7c21cee1d4878f
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Jun 28 14:04:18 2013 +1000
banning: Make ctdb_local_node_got_banned() a void function
When this function is called, we are already committed to banning
and there is no point in failing this function. In case, freezing of
databases fails, it will be fixed from recovery daemon.
commit 6a9dbb8fb0f1f6e8c206189cdc2d33bb371ea2a8
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Jun 28 14:02:44 2013 +1000
recoverd: Also check if current node is in recovery when it is banned
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 8d622660a14c929e365d306147b378ea6ab92175
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Jun 28 14:09:35 2013 +1000
recoverd: Set node_flags information as soon as we get nodemap
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 34af2cdf686d5d77854cbaa7bbcd8f878e9171c7
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jun 26 16:02:23 2013 +1000
recovered: Remove old comment as the code corresponding to that has gone away
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit c6f8407648abb37f2ed781afa5171dad8c9f59e9
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jun 24 14:31:50 2013 +1000
banning: Log ban state changes for other nodes at higher debug level
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 46efe7a886f8c4c56f19536adc98a73c22db906a
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 1 16:28:04 2013 +1000
freeze: Make ctdb_start_freeze() a void function
If this function fails due to memory errors, there is no way to recover.
The best course of action is to abort.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 87716e8f504d659515d3dbcf93badbf106873bc8
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 1 16:21:00 2013 +1000
freeze: If priority is invalid here, it's time to abort
ctdb_start_freeze() is called from ctdb_control_freeze() which fixes the
priority if it's 0 and return error if it's invalid. Other callers of
ctdb_start_freeze() are internal to CTDB. So if priority is invalid in
ctdb_start_freeze(), definitely something is seriously wrong.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 478e24bceda3fedfba54ccb48faa115df726b819
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 1 13:26:33 2013 +1000
freeze: Log message from ctdb_start_freeze() and ctdb_control_freeze()
This ensures that whenever databases are frozen either via sending
control or by calling ctdb_start_freeze(), the action is logged.
Since ctdb_control_freeze() calls ctdb_start_freeze(), move logging of
message in early return condition if databases are already frozen.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 4be8dff3a4451192f838497b4747273685959bed
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jun 24 14:18:58 2013 +1000
recoverd: Print banning message only after verifying pnn
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 7eb2f89979360b6cc98ca9b17c48310277fa89fc
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jun 26 15:22:46 2013 +1000
recoverd: When updating flags on nodes, send updated flags and not old flags
This was broken by commit a9a1156ea4e10483a4bf4265b8e9203f0af033aa.
Instead of a SRVID_SET_NODE_FLAGS message to recovery daemon, a control
was sent to the local daemon which in turn informed the recovery daemon.
And while doing this change old flags were sent via CONTROL_MODIFY_FLAGS.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 4f87925a287f612a6ab3b5da1a387a31c7bea28f
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jun 26 14:34:47 2013 +1000
tools/ctdb: Add "force" option to "recover" command
At the moment there is no easy way to force a recovery when attempting
to reproduce certain classes of bugs. This option is added without
documentation because it is dangerous until the bugs are fixed! :-)
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 733fc909425860f6a02c205c2d8f34a731853922
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jun 24 17:37:15 2013 +1000
client: Exit with non-zero status when unix socket is closed
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit abeb65ef02d018a7c14d4f8cea71e15c6cf9e357
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jun 21 14:49:20 2013 +1000
doc: Fix ctdb ping entry in manpage
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5d0215be5aefe492258a92c7bff2d41960379580
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jun 21 14:47:20 2013 +1000
doc: Fix documentation for NoIPTakeover in ctdbd manpage
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 4ba7c73eeab98296c9168e0b0fed1f6bb9f32733
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jun 21 14:33:12 2013 +1000
doc: Update notification script section in ctdbd manpage
The example notification script is now much more useful.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 4369c8e6ead9062ef7855ada375df74262acf925
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jun 21 14:32:50 2013 +1000
doc: Add nodestatus command to the ctdb manpage
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit cd6227aa38d3bb4e5043faeffe436004e27b6d06
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jun 21 10:52:05 2013 +1000
doc: Update NEWS
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b7aaa28b3a6a2de923417f3d143f8d516447711e
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jun 20 16:43:10 2013 +1000
tests: Integration tests use "ctdb nodestatus" for healthy cluster check
Also check that we're not in recovery mode.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b953524185632d7f96a76d8f3bbed7ac1d143d40
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jun 20 16:42:30 2013 +1000
tests: Integration test infrastructure should do only a single recovery
No need for 2 recoveries after a restart.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f1b7ca8dc3f34a59c7b3e55748f974ac9ed8f458
Author: Martin Schwenke <martin at meltin.net>
Date: Sat Jun 22 15:44:28 2013 +1000
ctdbd: Fix panic on overlapping shutdowns
The runstate can't be set to SHUTDOWN twice, so the current naive code
causes a panic on the 2nd shutdown. This regression was introduced in
commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b32fd04bfbf33062d45365b37a7247e272a76ceb
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jun 19 10:58:14 2013 +1000
ctdbd: Refactor shutdown sequence
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 9ea57af557028b1d2e5c560e7bcf4d014b9a8b1e
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 16 21:01:43 2013 +1000
eventscripts: "setup" event doesn't need to wait for SETUP runstate
The "setup" event isn't called until ctdbd is in CTDB_RUNSTATE_SETUP
anyway...
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit aabf0bf41cb8ec344f06b69492fb6c2a27f9e900
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jun 18 15:07:26 2013 +1000
tests/eventscripts: New tests for 00.ctdb "init" event
These test dropping of IPs and TDB checking.
New stubs for date, tdbdump, tdbtool.
Enhance ip stub to handle "ip addr show to ..."
Tweak some infrastructure.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 3b11b27f3e22e99947bc2d6c49c4427bd7a0e332
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jun 18 15:02:05 2013 +1000
eventscripts: 13.per_ip_routing should not try hard to find public_addresses
This essentially reverts d4621277240721e6d130a930b0100506b64467ea.
This was added for testing but the test code was actually broken.
CTDB itself will only process public IPs if $CTDB_PUBLIC_ADDRESSES is
set, so no code should try to be more flexible than that!
The test code has been fixed instead.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c3e7a6e10d486ba0dbafdf110db540675b2317bc
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jun 18 15:05:39 2013 +1000
tests/eventscripts: setup_ctdb() should always set $CTDB_PUBLIC_ADDRESSES
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit f3dd2eec200d6eeada2ea19cd7e76f1edfad6167
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jun 17 15:14:53 2013 +1000
logging: Notify parent when logging daemon is up
Messages are lost until it is really up because syslogd_is_started is
set too early. Adding a pipe to do the notification allows the parent
to wait and only set syslogd_is_started when the logging daemon is
actually ready.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 3bc93f312b8464fbfa2b2c44fffedc591fe5a3e0
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jun 17 10:14:24 2013 +1000
scripts: Move TDB checking from initscript to "init" event
It makes sense to do this in the "init" event and make the initscript
less complicated.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0b77cceb49a30a181063adc7868d42d2851318e8
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 16 20:29:33 2013 +1000
scripts: Move dropping of all IPs from initscript to "init" event
It makes sense to do this in the "init" event and make the initscript
less complicated.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5ffce65a1ad659b198ddf647622b899bdde45c72
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jun 18 14:53:17 2013 +1000
scripts: drop_ip() should use delete_ip_from_iface()
Otherwise secondary addresses that aren't owned by CTDB could be
dropped.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0b67397ef5419c781a35916575151da7b7e7cc27
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 16 20:24:10 2013 +1000
scripts: drop_all_public_ips() now prints messages to stdout, not log
Change all callers to maintain current behaviour.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0a0c8543f167e11b75a622513367b083e42cbd3f
Author: Martin Schwenke <martin at meltin.net>
Date: Sun Jun 16 19:49:02 2013 +1000
ctdbd: "init" event should run earlier in daemon initialisation
It should run before:
* the transport is started;
* databases are attached; and
* processing configuration files (e.g. nodes, public_addresses).
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit c48583fd238496a81ddc46a21892f0b49559036a
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jun 18 14:27:34 2013 +1000
tools/ctdb: Do not exit prematurely on control timeout if retrying in a loop
This avoids premature exits from "ctdb stop" and "ctdb continue" due to
intermittent control (e.g. getpnn, getnodemap) timeouts.
This needs a proper fix to distinguish between timeout and failure
conditions and take appropriate action.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 5f8547b1531bba4950b3d873a997585c3a16d31e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jun 13 12:55:29 2013 +1000
packaging: Update the minimum required library versions
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 02c63c591cc273122b3a547bb301b92f0e4bd217
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Jun 7 11:24:17 2013 +1000
build: Enable VERBOSE option to display build command line
make V=1 or make VERBOSE=1 will display build commands.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit f8bf99de3a5f56be67aaa67ed836458b1cf73e86
Author: Mathieu Parent <math.parent at gmail.com>
Date: Thu Jun 6 21:58:02 2013 +0200
build: Fix tdb.h path to enable building with system TDB library
commit 14a79c0f3967c88f8ffc8200d122f6c5ffdb63a8
Author: Mathieu Parent <math.parent at gmail.com>
Date: Thu Jun 6 21:43:08 2013 +0200
libctdb: Include config.h in libctdb/ctdb.c
Bug-Debian: http://bugs.debian.org/703551
commit edb2a3556d03e248b42f63dd2c62382b723bc98f
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jun 6 16:42:02 2013 +1000
ctdbd: Make sure we don't kill init process by mistake
If getpgrp() fails, it will return -1 and that will send KILL signal to init
process (PID 1). This does not happen on RHEL, but does on AIX.
Reported-by: Chris Cowan <cc at us.ibm.com>
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit cd4358b01c6c3d413b431f5760029d2b163b9c03
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jun 13 16:32:06 2013 +1000
tests/eventscripts: Unit tests for $CTDB_NFS_DUMP_STUCK_THREADS
Includes minor test infrastructure updates.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0e2b5a8f89440a53f996482ac0c98b31a4f2cad3
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jun 13 16:30:45 2013 +1000
tests/eventscripts: Fix -X tracing in iterate_test()
... and delete a bogus comment.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ce2ef2be8aa22c0baf868daac8d4cf27246baa14
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jun 13 15:50:44 2013 +1000
tests/eventscripts: Add unit tests for $CTDB_MONITOR_NFS_THREAD_COUNT
Includes minor test infrastructure updates.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 2503245db10d567af708a04edd3a3b488c24f401
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jun 13 11:56:25 2013 +1000
eventscripts: New configuration varable $CTDB_NFS_DUMP_STUCK_THREADS
If some nfsd threads are still alive after a shutdown during a restart
then this indicates the maximum number of threads for which a stack
trace should be dumped. This can be useful for trying to determine
why nfsd is stuck.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 99b0d8b8ecc36dfc493775b9ebced54539c182d2
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jun 13 10:17:20 2013 +1000
eventscripts: Add new option $CTDB_MONITOR_NFS_THREAD_COUNT
Consider the following example:
1. There are 256 nfsd threads configured.
2. 200 threads are "stuck" in system calls, perhaps waiting for the
underlying filesystem when an attempt is made to restart NFS.
3. 56 threads exit when NFS is stopped.
4. 56 new threads are started when NFS is started.
5. 200 "stuck" threads exit leaving only 56 threads running.
Setting this option to "yes" makes the 60.nfs monitor event look for
this situation and try to correct it.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c429394afbabaee09f9216dc743419adddf523ea
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 31 14:55:07 2013 +1000
recoverd: Log node that causes takoever run to fail
Extend takeover_fail_callback() to just log (and not do any ban
processing) when the callback data is NULL. Always call
ctdb_takeover_run() with the callback so that useful errors are always
logged.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit ac0892d3a57adb0587a37de0f94fa686bed8970f
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 24 15:38:54 2013 +1000
doc: Add release notes for 2.2
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 78cff9d54f241fb6a2943e50346f9c2ad9decc78
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 29 15:14:42 2013 +1000
build: Fix extra whitespaces
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 82d61f77c01df0fbb42743593937b175ce22a445
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 29 14:12:14 2013 +1000
tevent: Sync to tevent 0.9.18 from upstream
commit 506b27c944b4031e8a325816bd12abddd442a0bb
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 29 14:44:03 2013 +1000
replace: Sync to latest replace from upstream
The latest commits affecting lib/replace remove autoconf build from
Samba tree. So using following commit as a sync point.
commit 9ddfd7d8784e6f546628f48990b69ee2850be52d
Author: Andrew Bartlett <abartlet at samba.org>
Date: Wed May 22 17:23:30 2013 +1000
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit bb3a32ec055432afc7225c9fd7504fb187694bda
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 29 14:05:50 2013 +1000
tdb: Sync to tdb 1.2.11 from upstream
commit 3bffca8c17e441364525df115ee2ac16b5969e24
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 29 13:53:38 2013 +1000
talloc: Sync to talloc 2.0.8 from upstream
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit db31dc48bd3135e9242af08bb79b67a17a2b1668
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 29 12:11:49 2013 +1000
ctdbd: Log node state transitions at higher debug level
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit ca7ba26362eabfbcc329c66919d9c4da79c3b799
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 29 14:17:59 2013 +1000
git: Ignore generated ctdb.spec file
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 641f539ffc7dd9542e669a3ec20c004f8bbcbf1e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 29 14:17:00 2013 +1000
git: Ignore ctdb_version.h file
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit fa757b49374e44c2380d4457e9b0eb3582981fac
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri May 24 15:25:52 2013 +1000
build: Use REPLACE_OBJ and CTDB_EXTERNAL_OBJ to simplify build rules
This fixes the build on AIX where libreplace is required to build
ctdb_lock_helper, ctdb_fetch_lock_once, ctdb_fetch_readonly_once.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 2cf95741fdab2ee5f724950a0b1ef257d6aeade7
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri May 24 15:14:20 2013 +1000
build: Support for building on AIX xlc compiler
xlc does not support -fPIC, -Wno-format-zero-length
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 1734562a7b3512853b9e0232880c42d50c1c2e4c
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu May 23 23:44:45 2013 -0500
tests: Do not use err() to support AIX
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 0320bb4f8ca8171812ec7f41556aed847c74bfb4
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri May 24 14:52:09 2013 +1000
tests: Include system/time.h to support building on AIX
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 2c19fa78ce0b25c3615b23664df32233bdbdea42
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri May 24 14:51:46 2013 +1000
libctdb: Do not include sys/time.h to support build on AIX
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit b091f09ea01482823bd850d1d4e2329e0a19c959
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu May 23 23:42:23 2013 -0500
util: Do not stop build if backtracing is not supported
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 1b5968f6be084590667f4f15ff3bef13ed9a2973
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 28 12:01:57 2013 +1000
eventscripts: Fix statd-callout update handling
60.nfs and 60.ganesha touch $statd_update_trigger every time they're
run. This stops the statd-callout updates from ever being called.
Make this logic self-contained and move it to new function
nfs_statd_update() in the functions file. Call this in 60.nfs and
60.ganesha with the appropriate update period as the only argument.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reported-by: Poornima Gupte <poornima.gupte at in.ibm.com>
commit 25a6fd784cde96f3d20a79f70b5589b5c4aca675
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 28 11:26:17 2013 +1000
tests/integration: Improve debug output for unhealthy cluster after restart
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 80b3cf2c652c6098390cdd0dbb3edc648f7df487
Author: Martin Schwenke <martin at meltin.net>
Date: Mon May 27 15:16:28 2013 +1000
tests/scripts: Delete unused $rows and $ww variables from run_tests
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 85e11b9b13b3add88c1b8957be51793cc1db4f2d
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 28 14:19:32 2013 +1000
packaging: Create separate package for pcp pmda
To build ctdb-pcp-pmda package, run packaging/RPM/makerpms.sh script with
"--with pmda" option.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 194f7a0dec26d693a5f3e6734b1c82f61f8e4d19
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 28 14:16:02 2013 +1000
build: Separate autoconf macros for pmda
The pmda stuff is no longer built by default even if the headers are
available. To build, run "configure --enable-pmda".
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 11af486754bb04899e3dc544157bf70530e66cd1
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 28 14:16:25 2013 +1000
build: Fix install paths for pcp pmda
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit f2ef3510407fbad29908195c58e4160d5a81e8a4
Author: Martin Schwenke <martin at meltin.net>
Date: Mon May 27 14:43:03 2013 +1000
packaging: makerpms.sh can take multiple arguments for rpmbuild
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0ca7a98ffef50cbd06849cfbf65fb4a3d668b7bd
Author: Martin Schwenke <martin at meltin.net>
Date: Mon May 27 12:56:41 2013 +1000
eventscripts: Stop NAT gateway's delete_all() from polluting the log
Every time a node that wasn't the NAT gateway master gets reconfigured
something like this appears in the log:
ctdbd: 11.natgw: Failed to del 10.0.1.139 on dev eth1
Since this usually fails it is better to mute the error than to have
it pollute the log.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b2654853ce9b7c18c5874b080bc94d3118078a5d
Author: Martin Schwenke <martin at meltin.net>
Date: Mon May 27 11:29:42 2013 +1000
recoverd: Backward compatibility for nodes without IPREALLOCATED control
Consider the case of upgrading a cluster node by node, where some
nodes are still running older versions of CTDB without the
IPREALLOCATED control. If a "new" node takes over as recovery master
and a failover occurs, then it will attempt to send IPREALLOCATED
controls to all nodes. The "old" nodes will fail in a fairly
nondescript way (result == -1).
To try to handle this situation, fall back to the EVENTSCRIPT control
to handle "ipreallocated". Only do this on the failed nodes.
However, do not do this on nodes that timed out (they've probably
implemented the control and we should call the regular fail_callback
to get those nodes banned) or for stopped nodes (since they can't
actually run the "ipreallocated" event via the EVENTSCRIPT control).
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b2b572e9049c7138bd223226475bef8fe3e01f10
Author: Martin Schwenke <martin at meltin.net>
Date: Sat May 25 19:57:24 2013 +1000
scripts: Provide mktemp function for platforms without mktemp command
This is needed for AIX and possibly others.
Also provide a cheaper mktemp function is needed in the run_tests
script.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c9e36f596c63c9af7f80d7cb8d7a5c6dcca4860a
Author: Martin Schwenke <martin at meltin.net>
Date: Sat May 25 19:08:49 2013 +1000
tests: Fix integration tests to use real private IPs
192.0.2.x was a typo.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e5a5ab53173d9aa4190ddf68c4ae316d4473eb56
Author: David Disseldorp <ddiss at samba.org>
Date: Fri May 24 16:11:12 2013 +0200
pmda: handle new ctdb_statistics format
The ctdb_statistics structure was recently changed. Update the PMDA to
dereference the new structure member names.
Signed-off-by: David Disseldorp <ddiss at samba.org>
Reviewed-by: Michael Adam <obnox at samba.org>
commit 75a620c516e384f042b5d675183b3a1b48fd6115
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Apr 5 20:47:47 2013 +1100
tests/takeover: New test with 900 IPs
commit cfd1371d3a1f78a0ed86485d83bd4d311727c3d4
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Apr 5 20:45:08 2013 +1100
tests/takeover: Takeover tests can use up to 1024 and checks limits
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ef35c8889d90220929e48e66eb62da9ea2025ede
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 8 14:37:44 2013 +1000
tests/takeover: LCP2 tests for weird, unbalanced corner-cases
2 tests to show a bad result and a 3rd test for the fix.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 954ae6f84cb06a8dcbc12456d4752280072be5bf
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 8 14:37:08 2013 +1000
tests/takeover: Allow takeover runs with differing IP allocations per node
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 32723c9efdad1c6ca4aa53f308ccd9bef1aadfff
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri May 24 18:07:39 2013 +1000
vacuum: Reduce the priority of non-critical error
Since the complete database is not locked when the receive_records
control is received, it's possible that we may not be able to obtain
lock on a chain. We will try again to store this record.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Michael Adam <obnox at samba.org>
commit b697625b184227dad1be31a41b7a3fd9bd312e29
Author: Michael Adam <obnox at samba.org>
Date: Fri May 17 11:05:44 2013 +0200
ctdbd: fix comment explaining redirection of CTDB_REQ_CALL redirection.
Signed-off-by: Michael Adam <obnox at samba.org>
commit d9e24782a90d9ce29c0e6584b75d2b186142174d
Author: Michael Adam <obnox at samba.org>
Date: Fri May 17 11:01:31 2013 +0200
ctdbd: remove a nonempty blank line
Signed-off-by: Michael Adam <obnox at samba.org>
commit 9a21d417c51fb9cad8f2e87e00ca54d379aef860
Author: Michael Adam <obnox at samba.org>
Date: Fri May 17 11:00:32 2013 +0200
ctdbd: update comment describing ctdb_call_send_redirect()
Signed-off-by: Michael Adam <obnox at samba.org>
commit c57430998a3bdedc8a904eb3a9cdfde1421aff50
Author: Martin Schwenke <martin at meltin.net>
Date: Mon May 6 20:31:08 2013 +1000
tests/takeover: New tests to check runstate handling
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f15dd562fd8c08cafd957ce9509102db7eb49668
Author: Martin Schwenke <martin at meltin.net>
Date: Mon May 6 15:36:29 2013 +1000
recoverd: Nodes can only takeover IPs if they are in runstate RUNNING
Currently the order of the first IP allocation, including the first
"ipreallocated" event, and the "startup" event is undefined. Both of
these events can (re)start services.
This stops IPs being hosted before the "startup" event has completed.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit c0c27762ea728ed86405b29c642ba9e43200f4ae
Author: Martin Schwenke <martin at meltin.net>
Date: Thu May 23 19:03:11 2013 +1000
recoverd: Handle errors carefully when fetching tunables
If a tunable is not implemented on a remote node then this should not
be fatal. In this case the takeover run can continue using benign
defaults for the tunables.
However, timeouts and any unexpected errors should be fatal. These
should abort the takeover run because they can lead to unexpected IP
movements.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 1190bb0d9c14dc5889c2df56f6c8986db23d81a1
Author: Martin Schwenke <martin at meltin.net>
Date: Thu May 23 19:01:01 2013 +1000
recoverd: Set explicit default value when getting tunable from nodes
Both of the current defaults are implicitly 0. It is better to make
the defaults obvious.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 40e34773b8063196457746ffe7a048eb87d96d61
Author: Martin Schwenke <martin at meltin.net>
Date: Thu May 23 16:09:38 2013 +1000
client: async_callback() sets result to -ETIME if a control times out
Otherwise there is no way of treating a timeout differently to a
general failure.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 03fd90d41f9cd9b8c42dc6b8b8d46ae19101a544
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 21 15:41:56 2013 +1000
ctdbd: Update the get_tunable code to return -EINVAL for unknown tunable
Otherwise callers can't tell the difference between some other failure
(e.g. memory allocation failure) and an unknown tunable.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 473cfcb019f0cb4a094bf10397f7414f7923ee57
Author: Martin Schwenke <martin at meltin.net>
Date: Wed May 22 17:19:34 2013 +1000
recoverd: Whitespace improvements
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f6792f478197774d2f3b2258c969b67c83e017ab
Author: Martin Schwenke <martin at meltin.net>
Date: Wed May 22 20:56:03 2013 +1000
recoverd: Use talloc_array_length() for simpler code
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c50eca6fbf49a6c7bf50905334704f8d2d3237d7
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 11 18:02:51 2013 +1100
ctdbd: When the "setup" event fails log an error and exit, don't abort
The "setup" event can fail when one of the eventscripts fails to run
its "setup" event. If this occurs then the eventscript should log an
error. The stack trace and core file generated when we abort provides
no useful information.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 39a43feae7c7de07ddaf2d6cb962f923d47d0c19
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 11 16:02:31 2013 +1100
eventscripts: 11.natgw should not call ctdb tool in "init" event
The current code calls "ctdb setnatgwstate ..." on every event.
However, calling the ctdb tool in the "init" event is not permitted.
Instead, update the capability when it is needed and at regular
intervals via the "monitor" event.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit ed6814ff0a59ddbb1c1b3128b505380f60d7aeb7
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Apr 18 20:30:14 2013 +1000
ctdbd: Add new runstate CTDB_RUNSTATE_FIRST_RECOVERY
This adds more serialisation to the startup, ensuring that the
"startup" event runs after everything to do with the first recovery
(including the "recovered" event).
Given that it now takes longer to get to the "startup" state, the
initscript needs to wait until ctdbd gets to "first_recovery".
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 4a2effcc455be67ff4a779a59ca81ba584312cd6
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 11 14:09:14 2013 +1100
tools/ctdb: "ctdb runstate" now accepts optional expected run state arguments
If one or more run states are specified then "ctdb runstate" succeeds
only if ctdbd is in one of those run states.
At the moment, if the "setup" event fails then the initscript succeeds
but ctdbd exits almost immediately. This behaviour isn't very
friendly.
The initscript now waits until ctdbd is in "startup" or "running" run
state via the use of "ctdb runstate startup running", meaning that ctdbd
has successfully passed the "setup" event.
The "setup" event code in 00.ctdb now waits until ctdbd is in the
"setup" run state before proceeding via the use of "ctdb runstate setup".
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit bf20c3ab090f75f59097b36186347cedb1c445d4
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 11 14:07:12 2013 +1100
tools/ctdb: New command runstate to print current runstate
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit dc4220e6f618cc688b3ca8e52bcb3eec6cb55bb1
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 21 16:18:28 2013 +1000
ctdbd: New control CTDB_CONTROL_GET_RUNSTATE
Also new client function ctdb_ctrl_get_runstate().
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f43fe3a560d5915c1a9893256f4e7bfe3d7e290a
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jan 10 16:48:39 2013 +1100
ctdbd: Start logging process earlier
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit c31feb27dcdb748b5333321c85fe54852dfa1bcf
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jan 10 16:33:36 2013 +1100
ctdbd: Only start recovery daemon and timed events after setup event
This deconstructs ctdb_start_transport(), which did much more than
starting the transport.
This removes a very unlikely race and adds some clarity. The setup
event is supposed to set the tunables before the first recovery.
However, there was nothing stopping the first recovery from starting
before the setup event had completed.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 8076773a9924dcf8aff16f7d96b2b9ac383ecc28
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jan 10 16:06:25 2013 +1100
ctdbd: Replace ctdb->done_startup with ctdb->runstate
This allows states, including startup and shutdown states, to be
clearly tracked. This doesn't include regular runtime "states", which
are handled by node flags.
Introduce new functions ctdb_set_runstate(), runstate_to_string() and
runstate_from_string().
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 9e7b7cd04adc5e66e2ffa4edf463a682aaea379b
Author: Martin Schwenke <martin at meltin.net>
Date: Thu May 23 16:06:47 2013 +1000
tools/ctdb: Remove duplicate command definition for "sync"
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit dbb7c550133c92292a7212bdcaaa79f399b0919b
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 8 23:29:55 2013 +1000
logging: Make sure ringbuffer messages are terminated with a newline
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 29911fa44a480c17c701528ef46919b2a962a366
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 8 16:25:30 2013 +1000
tests: Fix output of run_tests usage
commit 80fbe9364350d42658f7f8af250ac87eb1afbc21
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 8 13:45:55 2013 +1000
locking: Set lock helper path once
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit c660f33c3eaa1b4a2c4e951c1982979e57374ed4
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 8 10:42:08 2013 +1000
locking: Remove functions that are not used anymore
These functions were used in locking child process to do the locking. With
locking helper, these are not required.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 6ea3212a7b177c6c06b1484cf9e8b2f4036653d9
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 30 15:13:44 2013 +1000
locking: Remove functions that are not used anymore
These functions were used in locking child process to do the locking. With
locking helper, these are not required.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 7cde53a6cbe74b1e46f7e1bca298df82c08de866
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 30 15:07:49 2013 +1000
locking: Use separate locking helper binary for locking
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit f665e3d540c90579952e590caa5828acb581ae61
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 30 14:32:46 2013 +1000
locking: Create commandline arguments for locking helper
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit a08b6ac19506160f3fb5925ea025027dce07781d
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Apr 22 15:36:27 2013 +1000
locking: Add a standalone helper to lock record/db
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 7630ca4116b476636c27407748088ea335f1a06c
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 30 14:14:16 2013 +1000
locking: Use database iterator for unmarking databases
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit adc113055de98fae276f9b501aff5c03cd25ddc8
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 30 14:16:07 2013 +1000
locking: Add handler function for unmarking a database
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit e8ea65b2713417db4a618a9f4633991cfaa93fe6
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 30 14:12:40 2013 +1000
locking: Use database iterator for marking databases
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit f120e40533780e02ff1cdc41cc6d3af1c4c83258
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 30 14:07:11 2013 +1000
locking: Add handler function for marking a database
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 187ed83f9701c7fa8d3cc476d47c5d2a87d5c308
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 30 14:10:06 2013 +1000
locking: Use database iterator for unlocking databases
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 725239535f40ca2cca445bb5bf2e181351b330e9
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 30 14:06:46 2013 +1000
locking: Add handler function for unlocking a database
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit d2634d72d9ca0ceeb72cbb1adc95017a234480fd
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 30 14:08:51 2013 +1000
locking: Use database iterator for locking databases
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 2a1c933ef7c78ee071e2a640ea10941f1c12e32a
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 30 14:06:27 2013 +1000
locking: Add handler function for locking a database
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit a3275854812aca86032704134fdf6a129069c86a
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 30 13:23:59 2013 +1000
locking: Refactor code to iterate over databases based on priority
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit d98a861716d5f8c1f4387d21666396d3164551b3
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 1 12:55:22 2013 +1000
locking: Add newline to debug logs
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 0577ce3c68e4febf49a1ef5093e918db9d5ec636
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu May 23 13:04:06 2013 +1000
tools/ctdb: Fix racy ipreallocate code
This code tried to find the recovery master and send an ipreallocate
request to that node. When a node is stopped, this code asked the
stopped node for recovery master. Stopped node does not have up-to-date
information on the current recovery master. So ipreallocate requests
were sent to the wrong node and ignored by that node which is not the
recovery master.
Send ipreallocate request to all active nodes. That way we guarantee
that the current recovery master will see it and respond to it.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-Programmed-With: Martin Schwenke <martin at meltin.net>
commit 9d4524d13cbba21bfaf61bd35667984359b379b3
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 22 15:37:46 2013 +1000
ctdbd: Print version string in the daemon startup
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit d18fcfff674e876abde8d51afec92d9c4a090d2f
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 22 14:23:17 2013 +1000
build: Rename version.h to ctdb_version.h
This avoids clash with version.h from Samba tree.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 939d12b96a0cbebbe6269fa2b14f584058dd6174
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu May 9 15:43:10 2013 +1000
logging: Fix a bug in ringbuffer
When ringbuffer is full, it does not return any entries. Simplify
ringbuffer logic by keeping track of number of log entries rather than
last entry.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-Programmed-With: Martin Schwenke <martin at meltin.net>
commit 14bd0b6961ef1294e9cba74ce875386b7dfbf446
Author: Martin Schwenke <martin at meltin.net>
Date: Mon May 13 15:27:04 2013 +1000
recoverd: takeover_run_core() should not use modified node flags
Modifying the node flags with IP-allocation-only flags is not
necessary. It causes breakage if the flags are not cleared after use.
ctdb_takeover_run() no longer needs the general node flags - it only
needs the IP flags.
Instead of modifying the node flags in nodemap, construct a custom IP
flags list and have takeover_run_core() use that instead of node
flags. As well as being safer, this makes the IP allocation code more
self contained and a little bit clearer.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a8605f7e06076e7edf84e0cc160fd3d9ab5c4b64
Author: Martin Schwenke <martin at meltin.net>
Date: Mon May 20 10:47:07 2013 +1000
ctdbd: Update confusing log message
Inactive can also mean stopped. To add information, just print the
flags instead.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3105f9e291d0792199ac9e689f6d0e0a47ee4b0d
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 17 16:46:41 2013 +1000
Packaging: maketarball.sh should be a bash script due to pushd use
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d29e9a420b133088bf23a847c8d1dbce56c25eb0
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 17 16:42:25 2013 +1000
scripts: Rework notify.sh to use notify.d/ directory
This makes it easier to add notification handlers.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 1f96ea08f9a39dfe537c9b957ac512c84dc76f91
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 14 16:20:32 2013 +1000
ctdbd: Log a message when recovery master changes
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-Programmed-With: Amitay Isaacs <amitay at gmail.com>
commit 3c3df1d6afec7e3e721f9bcd4e8b8e008fd6e50b
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 14 15:38:08 2013 +1000
ctdbd: Log add and delete of IPs
At the moment, when someone deletes all the IPs on a node, all we see
are the release IP messages and we have to guess why.
Some would argue that add/release are more significant than
take/release so they should be logged.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 4a8d90d0812a3242f58a2a0e2aa0f528f60f7013
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 14 15:30:53 2013 +1000
ctdbd: Removed bogus comment in ctdb_find_iface()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f1619a36c1beba11533052dc5728fa3adaa08870
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 14 14:56:26 2013 +1000
eventscripts: Fix regression in _loadconfig()
fff88940f71058e4eefd65f50a6701389c005c17 introduced a regression.
Without $service_name set by default, the CTDB configuration is no
longer loaded when loadconfig() is called without any arguments.
That's bad.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e6b6b793f61556c21e8daf34abf89ee7b388ecfb
Author: Martin Schwenke <martin at meltin.net>
Date: Thu May 9 20:44:11 2013 +1000
initscript: If CTDB doesn't become ready, print a message before killing
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0c0752515b66661ffae24be5f138bd2fab4dec5c
Author: Christian Ambach <ambi at samba.org>
Date: Wed May 8 08:45:09 2013 +0200
build: Create sudoers.d dir during make install
otherwise make install into non-standard prefix will fail
Signed-off-by: Christian Ambach <ambi at samba.org>
commit b0cae7d5a00ef3764bae187affc8e9a252f4b329
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue May 14 23:18:32 2013 +1000
eventscripts: Do not use bashism for string comparison
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit e143abd16ccde2e0edfe103673d31a5fb06b6aef
Author: Martin Schwenke <martin at meltin.net>
Date: Thu May 9 12:53:48 2013 +1000
recoverd: Move IP flags into ctdb_takeover.c
These should never be seen outside the IP allocation code.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 45c776958017ea7001f061842c9e0f60e4a25f23
Author: Martin Schwenke <martin at meltin.net>
Date: Thu May 9 12:51:57 2013 +1000
recoverd: Clear IP flags after IP allocation algorithm has run
If these flags are left set they will confuse other recovery daemon
code.
Factor the clearing code into new function clear_ipflags().
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit d0a3822573db296e73cc897835f783c8abc084b3
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 3 20:46:15 2013 +1000
recoverd: Remove unused mask argument and initial mask calculation
This has been replaced by set_ipflags() and associated functionality.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 65e0ea6c2c0629e19349ba4b9affa221fde2b070
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 3 20:41:32 2013 +1000
recoverd: When calculating rebalance candidates don't consider flags
This is really a check to see if a node is already hosting IPs. If
so, we assume it was previously healthy so it isn't considered as a
rebalance candidate. There's no need to limit this to healthy node,
since this is checked elsewhere.
Due to this the variable newly_healthy is renamed everywhere to
rebalance_candidates.
The mask argument is now completely unused.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 107e656bbe24f9d21fbaf886a3e9417da4effe5a
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 3 20:13:40 2013 +1000
recoverd: Remove unused mask argument from IP allocation functions
This is a no-op and is in a separate commit to make the previous
commit less cumbersome.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 7cf63722873a6a7baafd77aa3d8a1989b221dee9
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 3 15:57:21 2013 +1000
tests/takeover: Add takeover tests, mostly for NoIPHostOnAllDisabled
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 1308a51f73f2e29ba4dbebb6111d9309a89732cc
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 3 16:59:20 2013 +1000
recoverd: Fix tunable NoIPTakeoverOnDisabled, rename to NoIPHostOnAllDisabled
This really needs to be per-node. The rename is because nodes with
this tunable switched on should drop IPs if they become unhealthy (or
disabled in some other way).
* Add new flag NODE_FLAGS_NOIPHOST, only used in recovery daemon.
* Enhance set_ipflags_internal() and set_ipflags() to setup
NODE_FLAGS_NOIPHOST depending on setting of NoIPHostOnAllDisabled
and/or whether nodes are disabled/inactive.
* Replace can_node_servce_ip() with functions can_node_host_ip() and
can_node_takeover_ip(). These functions are the only ones that need
to look at NODE_FLAGS_NOIPTAKEOVER and NODE_FLAGS_NOIPHOST. They
can make the decision without looking at any other flags due to
previous setup.
* Remove explicit flag checking in IP allocation functions (including
unassign_unsuitable_ips()) and just call can_node_host_ip() and
can_node_takeover_ip() as appropriate.
* Update test code to handle CTDB_SET_NoIPHostOnAllDisabled.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 12aef10e9889760d98f58c8d916f19d069fa381a
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 3 16:56:24 2013 +1000
recoverd: Factor out new function all_nodes_are_disabled()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a1addd89fd9c0390912604097acd028cc24d3483
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 3 15:55:01 2013 +1000
tests/takeover: Allow per-node tunable settings
Implemented for CTDB_SET_NoIPTakeover.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 1fb5352d2b6918fcc6f630db49275d25a3eebe8d
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 3 16:21:16 2013 +1000
recoverd: Refactor code to get NoIPTakeover tunable from all nodes
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 9721aae001b3023e9c8b4af2b143c0db3442d623
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 3 15:53:13 2013 +1000
tests: Unit test diff output should use filtered output
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 91405282ba4abad4ad8e8c5f7ee4c83c75f38280
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 3 15:41:26 2013 +1000
recoverd: Add debug message when dropping IPs in IP allocation
Update tests accordingly.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0eb351ff4c7ee096de7c5e0a59561067091fa32e
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 23 12:30:33 2013 +1000
eventscripts: NFS RPC checks no longer support "knfsd"
No longer used, support removed from test infrastructure.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 7e792d6768d9ca420ce3713cb122e63afd594b15
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 23 12:17:31 2013 +1000
eventscripts: 60.nfs uses nfs_check_rpc_services() to check NFS RPC services
* New directory nfs-rpc-checks.d/ replaces hardcoded rules in 60.nfs
* Installation and packaging additions to handle nfs-rpc-checks.d/
* Unit test updates, including deleting 1 test that sanity checked
test infrastructure
* Test infrastructure changes to use nfs-rpc-checks.d/
Note that this removes support for $CTDB_NFS_SKIP_KNFSD_ALIVE_CHECK in
60.nfs. To get the equivalent behaviour, edit 20.nfsd.check and
remove/comment all lines.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d9775fcbd6e30eef8382bea68e2f9bad2309f2c1
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 23 11:14:48 2013 +1000
eventscripts: NFS RPC checks allows "nfsd" in addition to "knfsd"
Want nfs_check_rpc_services() to support filenames without the 'k'.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 9bc8fbee6550ed2814fb35c70d57fab21ef1b8fd
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 23 06:42:54 2013 +1000
eventscripts: New function nfs_check_rpc_services()
This is intended to replace nfs_check_rpc_service(), which builds
configuration into eventscripts.
nfs_check_rpc_services() uses a directory of configuration checks that
can be edited by an administrator. The files have one limit check and
a set of actions per line. The program name is extracted from the
file name.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5a717fd495ba5a2bfd481d69f38b68fa4576716f
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 23 06:28:27 2013 +1000
eventscripts: nfs_check_rpc_action() should be _nfs_check_rpc_action()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit cc3bb42e48bbdabd19187c231846b98589b4f4f3
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 23 06:27:02 2013 +1000
eventscripts: Factor out common code from nfs_check_rpc_service()
This creates new function _nfs_check_rpc_common().
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 887733dd7be53158bfe07b30ef31b611d0f8122f
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 23 06:17:15 2013 +1000
eventscripts: Remove ganesha support from nfs_check_rpc_service()
This is unused so doesn't need to be maintained. An attempt to use it
now will explicitly fail rather than implicitly fail via bitrot.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 77302dbfd85754e02559eccb2dd6c090db0b6b9f
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 23 06:14:43 2013 +1000
Revert "Eventscript functions: add optional version to nfs_check_rpc_service()"
This reverts commit 92f74fd589467b46c758e116e97417edfe8773d7.
This change is unused and is just complicating the function.
Conflicts:
config/functions
commit 15b0f78cbf8d6ba481b7eba9e4fe3f4270214c72
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 23 05:54:12 2013 +1000
eventscripts: Move rpc.statd existence check into nfs_check_rpc_service ()
The code in 60.nfs is going to be genericised, so make all the checks
look the same.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 4b4e7d8f0e8dcbab987e374d06ffaa21c06da0d3
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 22 15:45:13 2013 +1000
eventscripts: Factor NFS RPC check action code into nfs_check_rpc_action()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a8ef00608e48a551a334aded206146807aeb4c5a
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 15:33:12 2013 +1000
eventscripts: Remove unused function ctdb_check_counter_limit()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit bb2cdff77e8ec79e7d319159b9c9848ecfaaa0f1
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 15:23:20 2013 +1000
eventscripts: Use ctdb_check_counter() instead of ctdb_check_counter_limit()
ctdb_check_counter_limit() can soon be removed...
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ef2cf75e95ff382c65524a4d77eb00ab8411d2fc
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 15:19:52 2013 +1000
eventscripts: Might as well try to stat the reclock file first
It is in the background but it still might cause the counter to be
reset before it is checked.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 136abd4604dc68f7c696704bac708bae53cf1940
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 15:16:44 2013 +1000
eventscripts: Make the early exit in 01.reclock earlier
That way we don't even check the counter...
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 25ef4f655f1efc833deb5e244f9fff461e92f439
Author: Martin Schwenke <martin at meltin.net>
Date: Mon May 6 16:23:25 2013 +1000
eventscripts: Minor cleanups for killtcp/tickle functions
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 60a08eb96e1d97aab31e9bd4af01683c650541c2
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 11:39:46 2013 +1000
eventscripts: Tweak the timeout check in kill_tcp_connections()
This has 2 advantages:
1. It uses get_tcp_connections_for_ip() to check for leftover
connections, instead of custom code.
2. It checks for the timeout condition before sleeping. The current
code sleeps and then checks, so wastes a second.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 319c1b68d5aa78f82a68febcad233a7c78afc887
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 06:31:30 2013 +1000
eventscripts: In killtcp/tickle functions, $_failed should be boolean
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 8514ca56830b30e7f0eb5018632640daaf8ff65d
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 06:27:58 2013 +1000
eventscripts: Remove unused $_killcount from tickle_tcp_connections()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a621622903c7ef17764b15293d6ea8df5a53c7e1
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 06:25:26 2013 +1000
eventscripts: Refactor connection listing in killtcp and tickle functions
Uses new function get_tcp_connections_for_ip(). This avoids using a
temporary file and running netstat twice.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 10e4db8f796d1e3259733180494db3b4bbad291a
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 06:19:18 2013 +1000
eventscripts: Reimplement kill_tcp_connections_local_only()
... using kill_tcp_connections()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 23c0f5f48e3e5a0c1a3254c582299f7893cf0d33
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 06:14:01 2013 +1000
eventscripts: Change handling of one-way kills in kill_tcp_connections()
This change is a no-op. However, In a subsequent commit we'll merge
kill_tcp_connections_local_only() with this function.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3eae161472e6352f7f656851c73dc056f95113eb
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 06:05:52 2013 +1000
eventscripts: Remove unnecessary variables from killtcp/tickle functions
Setting these variables spawns lots of unnecessary processes, which
would surely slow down these functions on a busy system.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 9e25fb261447a196de05937052779b36e75e7215
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 03:54:17 2013 +1000
eventscripts: Clean up ctdb_check_command()
* Command is now multiple arguments, preserving quoting
* $service_name no longer printed, no longer an argument
* Debug output from failed command
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d9e6cb945c5edac9ca6405c9228bf647fab814f5
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 03:48:51 2013 +1000
eventscripts; Cleanup up ctdb_check_directories()
The documentation comments are wrong... and remove option
$service_name argument.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3d0a7d83ddc824961d876fc9afba829c90aef3e7
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 03:45:21 2013 +1000
eventscripts: Assert that $service_name is set in a few key places
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit fff88940f71058e4eefd65f50a6701389c005c17
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 15:31:27 2013 +1000
eventscripts: counters default to $script_name if $service_name not set
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 27aab8783898a50da8c4bc887b512d8f0c0d842c
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 03:32:29 2013 +1000
eventscripts: Simplify handling of $service name in "managed" functions
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
$service_name is no automatically longer set in the functions file.
This means it needs to be explicitly set in 13.per_ip_routing because
this script uses ctdb_service_check_reconfigure().
Eventscript unit test infrastructure needs to set $service_name during
fake service setup, and policy routing tests need to be updated
accordingly.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b5802c4735e1c719a5cf9ce69489d5947bd5e8c5
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 03:18:01 2013 +1000
eventscripts: Simplify handling of $service name in start/stop functions
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e24baac0d2952e86d5ff31235901f06e2f2b2449
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 03:13:36 2013 +1000
eventscripts: Simplify handling of $service name in service_management
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c2ea72ff565222f9edab408638bd45dbba6e8ff7
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 02:59:41 2013 +1000
eventscripts: Simplify handling of $service name in reconfigure functions
Complicated argument handling was introduced to deal with multiple
services per eventscript. This was a failure and we split 50.samba.
This simplifies several functions to use global $service_name
unconditionally instead of having an optional argument.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit fd536a26b310b5bf9628da62cca0b425f4a54030
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Apr 24 17:14:32 2013 +1000
eventscripts: Remove unused function ctdb_check_counter_equal()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 9dee4c84273633b9ad82e94dabbf0e6f86edbcef
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 23 13:56:15 2013 +1000
scripts: Fix script_log() regression
5940a2494e9e43a83f2bca098bd04dfc1a8f2e93 makes script_log() always
pass a message to logger, so script_log() can no longer log stdin.
Put all the tag fu in the actual tag so the message argument is empty
if no message was passed.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c74cc0442eb90d859eae270b59456d28605817c4
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 23 13:49:28 2013 +1000
initscript: Look for tdbtool/tdbdump using which, not in fixed locations
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit cd87ba85fc6c375758c7d3dfa8dbd4d8a02074b0
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 22 14:55:33 2013 +1000
ctdbd: Log CTDB startup before creating the PID file
Otherwise the messages are in a stupid order... :-)
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reported-by: Amitay Isaacs <amitay at gmail.com>
commit c2bb8596a8af6406ef50e53953884df9d6246a96
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Feb 21 14:28:13 2013 +1100
ctdbd: Remove the "stopped" event
It isn't used, superceded by "ipreallocated".
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 978d4a0d6d8c9877b23f72e3a7b78c1245d16908
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Feb 21 14:17:09 2013 +1100
eventscripts: Remove use of "stopped" event
Use "ipreallocated" instead. The "stopped" event pre-dates the
"ipreallocated" event. The only way of stopping a node is via the
ctdb tool, which explicitly causes a takeover run to occur after the
node is stopped. The takeover run will generate an "ipreallocated"
event.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 83b61f7414b1f7a3424497ac987ca0724fba9eaa
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Feb 21 13:13:09 2013 +1100
recoverd: ctdb_takeover_run() uses CTDB_CONTROL_IPREALLOCATED
This means "ipreallocated" is now run on stopped nodes.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 27a44685f0d7a88804b61a1542bb42adc8f88cb1
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Apr 19 13:05:02 2013 +1000
ctdbd: New control CTDB_CONTROL_IPREALLOCATED
This is an alternative to using ctdb_run_eventscripts() that can be
used when in recovery.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 05f785b51cfd8b22b3ae35bf034127fbc07005be
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 30 17:22:23 2013 +1000
ctdbd: Avoid freeing non-monitor event callback when monitoring is disabled
When running a non-monitor event, check is made for any active monitor
events. If there is an active monitor event, then the active monitor
event is cancelled. This is done by freeing state->callback which is
allocated from monitor_context.
When CTDB is stopped or shutdown, monitoring is disabled by freeing
monitor_context, which frees callback and then stopped or shutdown event
is run. This creates a new callback structure which is allocated at
the exact same memory location as the monitor callback which was freed.
So in the check for active monitor events, it frees the new callback
for non-monitor event. Since the callback function flags successful
completion of that event, it is never marked complete and CTDB is stuck
in a loop waiting for completion.
Move the monitor cancellation to the top of the function so that this
can't happen.
Follow log snippest highlights the problem.
2013/04/30 16:54:10.673807 [21505]: Received SHUTDOWN command. Stopping CTDB daemon.
2013/04/30 16:54:10.673814 [21505]: Shutting down recovery daemon
2013/04/30 16:54:10.673852 [21505]: server/eventscript.c:696 in remove_callback 0x1c6d5c0
2013/04/30 16:54:10.673858 [21505]: Monitoring has been stopped
2013/04/30 16:54:10.673899 [21505]: server/eventscript.c:594 Sending SIGTERM to child pid:23847
2013/04/30 16:54:10.673913 [21505]: server/eventscript.c:629 searching for callback 0x1c6d5c0
2013/04/30 16:54:10.673932 [21505]: server/eventscript.c:641 running callback
2013/04/30 16:54:10.673939 [21505]: server/eventscript.c:866 in event_script_callback
2013/04/30 16:54:10.673946 [21505]: server/eventscript.c:696 in remove_callback 0x1c6d5c0
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 0b7257642f62ebd83c05b6e2922f0dc2737f175c
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Feb 21 10:43:35 2013 +1100
recoverd: Interface reference count changes should not cause takeover runs
At the moment a naive compare of the all the interface data is done.
So, if any IPs move then the reference counts for the the relevant
interfaces change, interfaces appear to have changed and another
takeover run is initiated by each node that took/released IPs.
This change stops the spurious takeover runs by changing the interface
comparison to ignore the reference counts.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b5a8791268e938d7e017056e0e2bd2cbec1fa690
Author: Michael Adam <obnox at samba.org>
Date: Fri Apr 19 16:24:32 2013 +0200
recover: use CTDB_REC_RO_FLAGS where appropriate
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit c7eab97c7a939710b73aae2d75b404b235a998f5
Author: Michael Adam <obnox at samba.org>
Date: Fri Apr 19 16:23:16 2013 +0200
ctdb_daemon: use CTDB_REC_RO_FLAGS where appropriate
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit f99eb2f56d8ca27110a45ae0e1c4bff40ac7a60e
Author: Michael Adam <obnox at samba.org>
Date: Fri Apr 19 16:22:49 2013 +0200
ctdb_call: use CTDB_REC_RO_FLAGS where appropriate
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit a62775334aa20d1d850d2df705eb70303b04ac5c
Author: Michael Adam <obnox at samba.org>
Date: Fri Apr 19 16:09:34 2013 +0200
vacuum: use CTDB_REC_RO_FLAGS in the vacuuming code
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit 61f17e53576197def46bc61fdf0cdb5282333a3e
Author: Michael Adam <obnox at samba.org>
Date: Fri Apr 19 15:55:38 2013 +0200
ltdb_server: use CTDB_REC_RO_FLAGS where appropriate
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit c7924ce6404bb18641b00d5fbd2fe9da9aaf7959
Author: Michael Adam <obnox at samba.org>
Date: Fri Apr 19 16:01:45 2013 +0200
include: define CTDB_REC_RO_FLAGS - all read-only related record flags
This is used for some checks
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit 61264debba58355b9716ac1637fdedef5ed249c8
Author: Michael Adam <obnox at samba.org>
Date: Fri Feb 22 16:12:17 2013 +0100
vacuum: Update (C)
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit 06de786c786f1cab4c6721adf47c2cb1e8a72adb
Author: Michael Adam <obnox at samba.org>
Date: Sat Dec 29 17:23:27 2012 +0100
vacuum: extend the header comment for ctdb_process_delete_list()
Describe the (new) process more precisely.
And mention that is the last step of the vacuuming process
that is performed on the lmaster.
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit eee23d44b6427be8ab49bbfcee3abb62f37dfcc7
Author: Michael Adam <obnox at samba.org>
Date: Sat Jan 5 01:20:18 2013 +0100
vacuum: turn the vacuuming on lmaster into a three-phase process.
More precisely, before locally deleting an empty record, that has been
migrated with data and that we are dmaster and laster for, we now perform
the deletion on the other nodes in two steps instead of a single step.
- First send out the list of records to be deleted to all
other nodes with the new RECEIVE_RECORDS control to store
the lmaster's current empty copy.
- Then send those records that could be deleted on all nodes
to all nodes again with the TRY_DELETE_RECORDS control
as before for deletion.
- Finally delete those records locally that were successfully
deleted remotely in the previous step.
This fixes an old race where a recovery that hits the vacuum process
square between the eyes can create gaps in the record's history and
hence let the records resurrect. In the case of the locking.tdb,
that could mean that a file that was already closed, was recorded as
being open and locked again, so samba clients were locked out of that
file until samba was restarted.
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit e397702e271af38204fd99733bbeba7c1db3a999
Author: Michael Adam <obnox at samba.org>
Date: Fri Dec 21 00:24:47 2012 +0100
vacuum: introduce the RECEIVE_RECORDS control
This in preparation of turning the vacuming on the lmaster into
into a two phase process:
- First the node sends the list of records to be vacuumed
to all other nodes with this new RECEIVE_RECORDS control.
The remote nodes should store the lmaster's empty current copy.
- Only those records that could be stored on all other nodes
are processed further. They are send to all other nodes with
the TRY_DELETE_RECORDS control as before for deletion.
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit e3740899c1af6962f93c85ad7d1cb71bddce45c6
Author: Michael Adam <obnox at samba.org>
Date: Sat Dec 29 18:32:39 2012 +0100
vacuum: reorder some of ctdb_process_delete_list() more intuitively
Now that the nodemap and its talloc children don't hang off of the
delete_records_list talloc context, we can build the nodemap
and earlier, and move the construction of the delete_records_list
to where it is more obvious what it is used for.
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit b7c3b8cdf92c597e621e3dae28b110d321de5ea8
Author: Michael Adam <obnox at samba.org>
Date: Sat Dec 29 17:16:33 2012 +0100
vacuum: add explicit temporary memory context to ctdb_process_delete_list()
This removes the implicit artificial talloc hierarchy and makes the
code easier to understand.
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit 59a887e12469266e514ad7d4e34810e7ea888ba3
Author: Michael Adam <obnox at samba.org>
Date: Sat Jan 5 01:19:06 2013 +0100
vacuum: fix indentation in ctdb_process_delete_list()
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit 11d728465a9c635e1829abaae17e2f7720433b69
Author: Michael Adam <obnox at samba.org>
Date: Mon Dec 17 17:31:55 2012 +0100
vacuum: free temporary allocated memory correctly in ctdb_process_delete_list().
Add a common exit point for cleanup.
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit 3710dd0f313f551f1b302b4961e0203243e3d661
Author: Michael Adam <obnox at samba.org>
Date: Mon Dec 17 17:26:22 2012 +0100
vacuum: move variable into scope of use in ctdb_process_delete_list()
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit 4640979b526b6dac69a6a0555bfce75fe0206dac
Author: Michael Adam <obnox at samba.org>
Date: Mon Dec 17 13:07:21 2012 +0100
vacuum: move variable into scope of use in ctdb_process_delete_list()
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit f3e6e7f8ef22bd70dd2f101d818e2e5ab5ed3cd8
Author: Michael Adam <obnox at samba.org>
Date: Mon Dec 17 13:03:42 2012 +0100
vacuum: simplify ctdb_process_delete_list(): reduce indentation
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit 817c77a3d0a3546bf46389cec5f6b54778dd1693
Author: Michael Adam <obnox at samba.org>
Date: Wed Apr 3 14:12:27 2013 +0200
vacuum: add DEBUG to skip conditions in delete_record_traverse()
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit 3f7e35ff0db740cdcb6d27c43a59bb6ca6066efb
Author: Michael Adam <obnox at samba.org>
Date: Fri Apr 5 17:14:43 2013 +0200
vacuum: break line for RO-flags check in delete_record_traverse() for readability
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit e72a5e11845fe445baaee4730bb0bea8588ee9e3
Author: Michael Adam <obnox at samba.org>
Date: Mon Apr 22 10:21:02 2013 -0400
client: fix ctdb_control() to be able to cope with CTDB_CTRL_FLAG_NOREPLY
This was apparently not used before in this context, and the bug hence
not detected. It becomes necessary when ctdb_local_schedule_for_deletion()
is called from a client ctdbd (the vacuuming child), hence needs to send
the SCHEDULE_FOR_DELETION control to its parent.
Pair-Programmed-With: Stefan Metzmacher <metze at samba.org>
Signed-off-by: Stefan Metzmacher <metze at samba.org>
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit dc4ca816630ed44b419108da53421331243fb8c7
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Apr 19 13:29:04 2013 +1000
ctdbd: Set num_clients statistic from ctdb->num_clients
This fixes the problem of "ctdb statisticsreset" clearing the number of
clients even when there are active clients.
Values returned in statistics for frozen, recovering, memory_used are based on
the current state of CTDB and are not maintained as statistics. This should
include num_clients as well.
Currently ctdb->num_clients is unused. So use that to track the number of
clients and fill in statistics field only when requested.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit bfed6a8d1771db3401d12b819204736c33acb312
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 22 13:52:04 2013 +1000
ctdbd: Log PID file creation and removal at NOTICE level
Unexpected removal of this file can have serious consequences, so it
is best if this is logged at the default level.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5940a2494e9e43a83f2bca098bd04dfc1a8f2e93
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 22 13:48:06 2013 +1000
scripts: Ensure even external scripts get tagged in logs as "ctdbd"
Our practice is to search logs for "ctdbd:". We want to make sure we
find everything.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0076cfc4666e5a96eb2c8affb59585b090840e00
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 22 06:52:49 2013 +1000
eventscripts: Ensure directories are created
Previous commits stopped the top level of the script from creating
certain directories but some functions assume that required
directories exist.
Create those directories instead.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 700cf95a1f29b4b88460a00a55d57a9e397011e0
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Apr 17 13:26:04 2013 +1000
scripts: Clean up update_tickles() and handling of associated directory
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 85efa446c7f5c5af1c3a960001aa777775ae562f
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Apr 17 13:12:32 2013 +1000
scripts: Use $CTDB_SCRIPT_DEBUGLEVEL instead of something more complex
The current logic is horrible and creates an unnecessary file. Let's
make the script debug level independent of ctddb's debug level.
* Have debug() use $CTDB_SCRIPT_DEBUGLEVEL directly
* Remove ctdb_set_current_debuglevel()
* Remove the "getdebug" command from ctdb stub in eventscript unit
tests
* Update relevant eventscript unit tests to use
$CTDB_SCRIPT_DEBUGLEVEL
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d254d03f69cbdc3e473202b759af6e1392cbb59c
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Apr 19 13:10:27 2013 +1000
scripts: Ensure service command is in $PATH in ctdb-crash-cleanup.sh
Move the use of the service command below inclusion of functions file,
which sets $PATH.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e7a4b7e35a1e4b826846e2494a3803abb57065ee
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 15 19:15:22 2013 +1000
initscript: Remove duplicate setting of $ctdbd
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Michael Adam <obnox at samba.org>
commit 1e989894764e4cd1d551c44784d91cb295cd790d
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 16 11:40:55 2013 +1000
util: Removed unused declaration of ctdbd_start()
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Michael Adam <obnox at samba.org>
commit abb64f62efaa70df4b87c030b96300eafd98e6a3
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 15 13:31:42 2013 +1000
include: Move ctdb_start_daemon() from ctdb_client.h to ctdb_private.h
It really is internal.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Michael Adam <obnox at samba.org>
commit 90cb337e5ccf397b69a64298559a428ff508f196
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 15 15:42:55 2013 +1000
scripts: ctdb-crash-cleanup.sh uses initscript to see if ctdbd is running
"ctdb ping" can time out. How many times should we try?
Instead, depend on the initscript to implement something sane.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Michael Adam <obnox at samba.org>
commit 687e2eace4f48400cf5029914f62b6ddabb85378
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 15 15:18:12 2013 +1000
initscript: Use a PID file to implement the "status" option
Using "ctdb ping" and "ctdb status" is fraught with danger. These
commands can timeout when ctdbd is running, leading callers to believe
that ctdbd is not running. Timeouts could be increased but we would
still have to handle potential timeouts.
Everything else in the world implements the "status" option by
checking if the relevant process is running. This change makes CTDB
do the same thing and uses standard distro functions.
This change is backward compatible in sense that a missing
/var/run/ctdb/ directory means that we don't do a PID file check but
just depend on the distro's checking method. Therefore, if CTDB was
started with an older version of this script then "service ctdb
status" will still work.
This script does not support changing the value of CTDB_VALGRIND
between calls. If you start with CTDB_VALGRIND=yes then you need to
check status with the same setting. CTDB_VALGRIND is a debug
variable, so this is acceptable.
This also adds sourcing of /lib/lsb/init-functions to make the Debian
function status_of_proc() available.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Michael Adam <obnox at samba.org>
commit 996e74d3db0c50f91b320af8ab7c43ea6b1136af
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 15 13:32:57 2013 +1000
ctdbd: Add --pidfile option
Default is not to create a pid file.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Michael Adam <obnox at samba.org>
commit ba8866d40125bab06391a17d48ff06a4a9f9da89
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Apr 15 16:14:40 2013 +1000
util: ctdb_fork() should call ctdb_set_child_info()
For now we pass NULL as the child name. Later we'll give ctdb_fork()
and friends an extra argument and pass that through.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Michael Adam <obnox at samba.org>
commit 59b019a97aad9a731f9080ea5be14d0dbdfe03d6
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Apr 16 11:11:11 2013 +1000
util: New functions ctdb_set_child_info() and ctdb_is_child_process()
Must be called by all child processes.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Reviewed-by: Michael Adam <obnox at samba.org>
commit 06ac62f890299021220214327f1b611c3cf00145
Author: Michael Adam <obnox at samba.org>
Date: Wed Apr 17 13:08:49 2013 +0200
tests: add a comment to recovery db corruption test
The comment explains that we use "ctdb stop" and "ctdb continue"
but we should use "ctdb setcrecmasterrole off".
Signed-off-by: Michael Adam <obnox at samba.org>
commit b1577a11d548479ff1a05702d106af9465921ad4
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Apr 11 16:59:36 2013 +1000
tests: Add a test for subsequent recoveries corrupting databases
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Michael Adam <obnox at samba.org>
commit 2438f3a4944f7adbcae4cc1b9d5452714244afe7
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Apr 11 16:58:34 2013 +1000
tests: Support waiting for "recovered" state in tests
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Michael Adam <obnox at samba.org>
commit cad3107b12e8392f786f9a758ee38cf3a3d58538
Author: Michael Adam <obnox at samba.org>
Date: Wed Apr 3 12:02:59 2013 +0200
ctdb_call: don't bump the rsn in ctdb_become_dmaster() any more
This is now done in ctdb_ltdb_store_server(), so this
extra bump can be spared.
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit feb1d40b21a160737aead22e398f3c34ff3be8de
Author: Michael Adam <obnox at samba.org>
Date: Wed Apr 3 11:40:25 2013 +0200
Fix a severe recovery bug that can lead to data corruption for SMB clients.
Problem:
Recovery can under certain circumstances lead to old record copies
resurrecting: Recovery selects the newest record copy purely by RSN. At
the end of the recovery, the recovery master is the dmaster for all
records in all (non-persistent) databases. And the other nodes locally
hold the complete copy of the databases. The bug is that the recovery
process does not increment the RSN on the recovery master at the end of
the recovery. Now clients acting directly on the Recovery master will
directly change a record's content on the recmaster without migration
and hence without RSN bump. So a subsequent recovery can not tell that
the recmaster's copy is newer than the copies on the other nodes, since
their RSN is the same. Hence, if the recmaster is not node 0 (or more
precisely not the active node with the lowest node number), the recovery
will choose copies from nodes with lower number and stick to these.
Here is how to reproduce:
- assume we have a cluster with at least 2 nodes
- ensure that the recmaster is not node 0
(maybe ensure with "onnode 0 ctdb setrecmasterrole off")
say recmaster is node 1
- choose a new database name, say "test1.tdb"
(make sure it is not yet attached as persistent)
- choose a key name, say "key1"
- all clustere nodes should ok and no recovery running
- now do the following on node 1:
1. dbwrap_tool test1.tdb store key1 uint32 1
2. dbwrap_tool test1.tdb fetch key1 uint32
==> 1
3. ctdb recover
4. dbwrap_tool test1.tdb store key1 uint32 2
5. dbwrap_tool test1.tdb fetch key1 uint32
==> 2
4. ctdb recover
7. dbwrap_tool test1.tdb fetch key1 uint32
==> 1
==> BUG
This is a very severe bug, since when applied to Samba's locking.tdb
database, it means that for SMB clients on clustered Samba there is
the potential for locking out oneself from previously opened files
or even worse, data corruption:
Case 1: locking out
- client on recmaster opens file
- recovery propagates open file handle (entry in locking.tdb) to
other nodes
- client closes file
- client opens the same file
- recovery resurrects old copy of open file record in locking.tdb
from lower node
- client closes file but fails to delete entry in locking.tdb
- client tries to open same file again but fails, since
the old record locks it out (since the client is still connected)
Case 2: data corruption
- clien1 on recmaster opens file
- recovery propagates open file info to other nodes
- client1 closes the file and disconnects
- client2 opens the same file
- recovery resurrects old copy of locking.tdb record,
where client2 has no entry, but client1 has.
- but client2 believes it still has a handle
- client3 opens the file and succees without
conflicting with client2
(the detached entry for client1 is discarded because
the server does not exist any more).
=> both client2 and client3 believe they have exclusive
access to the file and writing creates data corruption
Fix:
When storing a record on the dmaster, bump its RSN.
The ctdb_ltdb_store_server() is the central function for storing
a record to a local tdb from the ctdbd server context.
So this is also the place where the RSN of the record to be stored
should be incremented, when storing on the dmaster.
For the case of the record migration, this is currently done in
ctdb_become_dmaster() in ctdb_call.c, but there are other places
such as in recovery, where we should bump the RSN, but currently
don't do it.
So moving the RSN incrementation into ctdb_ltdb_store_server fixes
the recovery-record-resurrection bug.
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit 4c0cbfbe8b19f2e6fe17093b52c734bec63dd8b7
Author: Michael Adam <obnox at samba.org>
Date: Mon Apr 15 12:50:42 2013 +0200
logging: fix comment typo
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 2e92deef5221ee651028ef87138b3113f1fece91
Author: Michael Adam <obnox at samba.org>
Date: Wed Apr 3 14:03:32 2013 +0200
ctdbd: unimplement the unused SET_DMASTER control
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 9f01b8db72780acf2f88f1392bc0a796dd4c6176
Author: Michael Adam <obnox at samba.org>
Date: Fri Mar 22 17:48:00 2013 +0100
recoverd: remove bogus comment "qqq" from "add prototype new banning code"
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit e96acf19b4d1e0f951ab92b88869a01ff06398be
Author: Michael Adam <obnox at samba.org>
Date: Fri Apr 5 16:55:18 2013 +0200
build: silence building of porting_test
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-by: Amitay Isaacs <amitay at gmail.com>
commit 5808f0778b39b79ab7a5c7f53ad27947131386ec
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Apr 11 13:20:09 2013 +1000
traverse: Ensure backward compatibility for CTDB_CONTROL_TRAVERSE_ALL
This makes sure that CTDB_CONTROL TRAVERSE_ALL is compatible with older versions
of CTDB (i.e. 1.2.39 and 1.2.40 branches).
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Michael Adam <obnox at samba.org>
commit e691df43d20871468142c8fb83f7c7303c4ec307
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Apr 11 13:18:36 2013 +1000
traverse: Add CTDB_CONTROL_TRAVERSE_ALL_EXT to support withemptyrecords
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Michael Adam <obnox at samba.org>
commit 043e18a8324ccb2c8ddd7b323ebedb5b0de1298d
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Apr 11 16:58:59 2013 +1000
tests: Fix typo in variable name
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 35264e42ade4676468cf7713fa339c784e932953
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Mar 27 12:32:43 2013 +1100
tools/ltdbtool: Fix handling of -e option
Also, include description of -e option in usage.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 1c7adbccc69ac276d2b957ad16c3802fdb8868ca
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Apr 5 13:34:06 2013 +1100
recoverd/takeover: Use IP->node mapping info from nodes hosting that IP
When collating IP information for IP layout, only trust the nodes that are
hosting an IP, to have correct information about that IP. Ignore what all the
other nodes think.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit fe8c4880b371492a38554868d4ca10918c54e412
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Apr 3 14:44:08 2013 +1100
statd-callout: Make sure statd callout script always runs as root
In RHEL 6+, rpc.statd runs as "rpcuser" instead of root as on RHEL 5. This
prevents CTDB tool commands talking to daemon since "rpcuser" cannot access
CTDB socket.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Pair-Programmed-With: Martin Schwenke <martin at meltin.net>
commit 524ec206e6a5e8b11723f4d8d1251ed5d84063b0
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Mar 18 13:45:08 2013 +1100
client: Set the socket non-blocking only after connect succeeds
If the socket is set non-blocking before connect, then we should catch
EAGAIN errors and retry. Instead of adding a random number of retries,
better to wait for connect to succeed and then set the socket to
non-blocking.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 74acc2c568300ef42740cf11299a1b2507047f60
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Apr 5 13:19:34 2013 +1100
Revert "client: handle transient connection errors"
This reverts commit dc0c58547cd4b20a8e2cd21f3c8363f34fd03e75.
There is a simpler solution that retrying random number of times. Do not set
socket non-blocking till connect succeeds.
commit f7f8bde2376f8180a0dca6d7b8d7d2a4a12f4bd8
Author: Volker Lendecke <vl at samba.org>
Date: Wed Apr 3 14:59:21 2013 +0200
common/messaging: Use the jenkins hash in ctdb_message
This give a better hash distribution
commit c137531fae8f7f6392746ce1b9ac6f219775fc29
Author: Volker Lendecke <vl at samba.org>
Date: Fri Apr 5 13:11:31 2013 +1100
common/messaging: use tdb_parse_record in message_list_db_fetch
This avoids malloc/free in a hot code path.
commit bf7296ce9b98563bcb8426cd035dbeab6d884f59
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Apr 3 15:08:14 2013 +1100
common/messaging: Abstract db related operations inside db functions
This simplifies the use of message indexdb API and abstracts tdb related code
inside the API.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 20be1f991dd75c2333c9ec9db226432a819f57ba
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 2 16:57:51 2013 +1100
common/messaging: Don't forget to free the result returned by tdb_fetch()
This fixes a memory leak in the messaging code.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 4e1ec7412866f2d31c41de1bec0fbf788c03051b
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Apr 2 12:08:39 2013 +1100
common/messaging: Free message list header if all message handlers are freed
This makes sure that even if the srvids are not deregistered, the header
structure is freed when the last message handler has been freed as a result of
client going away.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 85b777196289646ca37e06ebbf1f7a684d0aabc5
Author: Sumit Bose <sbose at redhat.com>
Date: Mon Mar 25 12:28:31 2013 +0100
build: Fix for tevent autoconf check
The list of include files is the 4th argument of AC_CHECK_DECLS.
commit 307416afda707b687f5e89e8438e45c154a4c806
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Mar 13 22:57:44 2013 +1100
util: Add hex_decode_talloc() to decode hex string into a binary blob
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 08c53ee609b80f87450a7a1d7dd24fbcdf5ab7bc
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Mar 13 11:46:18 2013 +1100
logging: Do not ignore stdout/stderr from the exec'd children
To log debugging information from child processes that are started
with vfork and exec, do not set close_on_exec on STDOUT and STDERR for
that process.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 87c89b7c2a14e2ee79a3efc7e8125842bc04bf23
Author: Michael Adam <obnox at samba.org>
Date: Fri Feb 22 12:42:10 2013 +0100
server:persistent: fix a debug message (copy'n'paste error)
Signed-off-by: Michael Adam <obnox at samba.org>
commit 98abd344342a011a8599411deae79f94abc09541
Author: Volker Lendecke <vl at samba.org>
Date: Tue Mar 12 13:53:58 2013 +0100
fix a typo
Reviewed-by: Michael Adam <obnox at samba.org>
commit 11734be353a1e246163eda631d35dfe55d1d6fb1
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Feb 22 12:59:39 2013 +1100
common/io: For scheduling immediate events use tevent_schedule_immediate
tevent_schedule_immediate() is much more efficient at handling events that need
to be processed immediately rather than creating timed events with
timeval_zero().
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 3e09f25d419635f6dd679b48fa65370f7860be7d
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Feb 21 13:16:15 2013 +1100
ctdbd: Add an index db for message list for faster searches
When CTDB is busy with lots of smbd, CTDB was spending too much time in
daemon_check_srvids() which searches a list of srvids in the registered
message handlers. Using a hash based index significantly improves the
performance of search in a linked list.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 5402f85dde045576cbaf64e01c68e28ed52204e8
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Feb 27 16:01:55 2013 +1100
tools/ctdb: delip no longer fails if IP can not be moved
Moving the IP is an optimisation so should not cause failure.
Refactor and simplify the retry-move-IP into new function
try_moveip().
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 6455ce5e4980a63d56ed30f7059869c8356c12ea
Author: Michael Adam <obnox at samba.org>
Date: Fri Feb 22 11:36:00 2013 +0100
server:persistent: fix a comment typo.
Signed-off-by: Michael Adam <obnox at samba.org>
commit 4f71dca8df19a63f198e2d6d59e605b49ec5e803
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Feb 18 16:39:00 2013 +1100
recoverd: update_capabilities() should use connected nodes
... as the comment says... not just active nodes.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit f505020a5720faa4ecc6414e0bfaa6b3c0e47291
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Feb 19 14:30:50 2013 +1100
client: Refactor node listing functions to use list_of_nodes()
This reduces repetition.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit a73bb56991b8c07ed0e9517ffcf0dc264be30487
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Feb 19 14:29:06 2013 +1100
client: New generic node listing function list_of_nodes()
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit d788bc8f7212b7dc1587ae592242dc8c876f4053
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Jan 18 10:42:14 2013 +1100
common/io: Rewrite socket handling code to read all available data
This improves the processing of packets considerably. It has been
observed that there can be as many as 10 packets in the socket buffer and
the current code of reading a single packet from a socket at a time is
not very optimal. This change reads all the bytes from socket buffer and
then parses to extract multiple packets. If there are multiple packets,
set up a timed event to process next packet.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 855ab348901edb3ec1327499a43f509d279b8182
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Feb 15 11:18:45 2013 +1100
doc: Fix typo in ctdbd manpage
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e204fac03412520e877ab04363b3ece02667c55b
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Feb 11 13:23:47 2013 +1100
ctdbd: Fix the PullDBPreallocation size to 10MB as intended
In 1f262deaad0818f159f9c68330f7fec121679023, Ronnie changed recovery code
to allocate chunks of 10MB in traverse_pulldb() and traverse_recdb(). The
tunable PullDBPreallocation size was set to 100MB.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 053b89c6dbce47001505524606889334559d2ec4
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Feb 11 11:25:49 2013 +1100
eventscripts: Remove calls to "smbstatus -np" for samba cleanup
This is an artifact from older versions of Samba. In the newer versions of
Samba, "smbstatus -np" command does not do anything useful, but causes a
traverse in CTDB which is expensive and causes CPU utilization to shoot up.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 00db5fa00474f8a83f1aa3b603fd756cc9b49ff4
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Feb 6 14:15:11 2013 +1100
Logging: Fix breakage when freeing the log ringbuffer
Commit a82d3ec12f0fda16d6bfa8442a07595de897c10e broke fetching from
the log ringbuffer. The solution there is still generally good: there
is no need to keep the ringbuffer in children created by
ctdb_fork()... except for those special children that are created to
fetch data from the ringbuffer!
Introduce a new function ctdb_fork_no_free_ringbuffer() that does
everything ctdb_fork() needs to do except free the ringbuffer (i.e. it
is the old ctdb_fork() function). The new ctdb_fork() function just
calls that function and then frees the ringbuffer in the child.
This means all callers of ctdb_fork() have the convenience of having
the ringbuffer freed. There are 3 special cases:
* Forking the recovery daemon. We want to be able to fetch from the
ringbuffer there.
* The ringbuffer fetching code. Change the 2 calls in this code (main
daemon, recovery daemon) to call ctdb_fork_no_free_ringbuffer()
instead.
While we're here, clear the log ringbuffer when the recovery deamon is
forked, since it will contain a copy of the messages from the main
daemon.
Note to self: always test... even the most obvious patches... ;-)
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b940e3a24daa73ca9b2896b7a449240136442b53
Author: Volker Lendecke <vl at samba.org>
Date: Wed Feb 6 10:28:37 2013 +0100
Fix a comment typo
Signed-off-by: Volker Lendecke <vl at samba.org>
Reviewed-by: Michael Adam <obnox at samba.org>
commit a0ef73e197dc9147f7718e0813fe803ff0b3d54d
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Feb 5 13:16:46 2013 +1100
initscript: export CTDB_EXTERNAL_TRACE
This means it can be set like any other configuration option in the
configuration file, without needing to export it there.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 9b0d56b16775aa16f33bdfdf831256e085fa3339
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Feb 5 14:36:29 2013 +1100
ctdbd: Don't use a fixed length buffer for the hung script command
The amount of data to write into the buffer wasn't constrained
anywhere...
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3400b2ed34b6eb9496eb55f1aab6f89d2952060d
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Feb 5 14:25:01 2013 +1100
ctdbd: Complain loudly if CTDB_DEBUG_HUNG_SCRIPT script isn't executable
This is quite easy to misconfigure by failing to set the execute bit
on the script. Better to complain loudly.
This is a debugging facilty rather than core CTDB functionality, so it
doesn't need a subtle mechanism to disable it at run-time. To disable
the designated script at run-time either edit it to put an "exit 0" at
the top or move it aside and symlink to /bin/true.
This is implemented by actually removing the code that checks that the
file exists and is executable. The output from the shell when the
system() function fails is just as useful.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0581f9a84e58764d194f4e04064c2c5b393c348b
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Feb 5 15:49:52 2013 +1100
ctdbd: Remove command-line option --debug-hung-script
Use an environment variable instead. This just means that the
initscript exports CTDB_DEBUG_HUNG_SCRIPT and the code checks for the
environment variable.
The justification for this simplification is that more debug options
will be arriving soon and we want to handle them consistently without
needing to add a command-line option for each. So, the convention
will be to use an environment variable for each debug option.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 501461cc3e132d4adee9e91b5d4513a26bae2846
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Feb 5 13:08:55 2013 +1100
ctdbd: Remove debug_hung_script_ctx
The only allocation against this context is by
ctdb_fork_with_logging(). This memory is freed by ctdb_log_handler()
anyway. There should be no memory leak.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f1ffe1112b7e342d7f1228ca816a8e5918f893cf
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jan 10 14:39:09 2013 +1100
ctdbd: Message logged at exit should be different for different processes
Some subprocesses print "CTDB daemon shutting down" when they exit and
this can be confusing.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 35da9a7c2a0f5e54e61588c3c3455f06ebc66822
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jan 22 13:27:20 2013 +1100
daemon: Make sure all the traverse children are terminated if traverse times out
When traverse times out, callback function is called with key and data set to
tdb_null. This is also the way to signal end of traverse. So if the traverse
times out, callback function treats it as traverse ended and frees state without
calling the destructor.
Keep track if the traverse timed out, so callback function can take appropriate
action for traverse timeout and traverse end.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit a82d3ec12f0fda16d6bfa8442a07595de897c10e
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Feb 5 12:09:36 2013 +1100
Logging: Free the ringbuffer in child processes created with ctdb_fork()
At the moment the log ringbuffer is duplicated in every child process.
Althought it is copy-on-write we want to see if it is contributing to
out-of-memory situations when there are a lot of children.
The ringbuffer isn't accessible from any of the children anyway...
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a4f622e85168f59417c11705f1734e0352e1d44a
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Feb 5 12:08:11 2013 +1100
Logging: New function ctdb_log_ringbuffer_free()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 25a20409fb39a94b64c13990c0eba4f75d482ecd
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Feb 5 12:13:57 2013 +1100
build: Fix a Makefile.in typo
Objects are named *.o ;-)
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d1ec06d30148e6fd344625a2fbf1c22391bd908a
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 11 12:39:37 2013 +1100
tools/ctdb: Fix a compiler warning
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 124e2a471aeda9c900fd898178a30522d7d74221
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jan 23 14:35:47 2013 +1100
recoverd: Fix printing of node flags from local information
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit b054193d1d19a8eef998fa690899501f79badb8a
Author: Mathieu Parent <math.parent at gmail.com>
Date: Mon Jan 14 17:48:01 2013 +0100
common: Don't lie on unimplemented gratuitous arp
Signed-off-by: Mathieu Parent <math.parent at gmail.com>
commit 109f428aa34f8f4cc0329880d2f4a5593a6cc6f3
Author: Mathieu Parent <math.parent at gmail.com>
Date: Mon Jan 14 17:21:01 2013 +0100
tests: Test portability
Curiously test_ctdb_sys_check_iface_exists fails on Linux
Signed-off-by: Mathieu Parent <math.parent at gmail.com>
commit 258092aaf6b7a9bdc14f0fb35e8bd7f7dc742b3f
Author: Mathieu Parent <math.parent at gmail.com>
Date: Mon Jan 14 12:13:24 2013 +0100
common: FreeBSD+kFreeBSD: Implement get_process_name (same as in Linux)
Signed-off-by: Mathieu Parent <math.parent at gmail.com>
commit d202b2fdd4fd70172e5e44583627b57a1b7ad2ed
Author: Mathieu Parent <math.parent at gmail.com>
Date: Mon Jan 14 11:23:46 2013 +0100
common: Detailed platform-specific FIXME
Signed-off-by: Mathieu Parent <math.parent at gmail.com>
commit 3c6a9b73364c9543366fa033c778145dc7a152a9
Author: Mathieu Parent <math.parent at gmail.com>
Date: Sun Jan 13 14:15:20 2013 +0100
build: Update config.guess 2012-12-30 and config.sub to 2013-01-11
Signed-off-by: Mathieu Parent <math.parent at gmail.com>
commit 95fc493a7d4145f976cb3fe928d9e92faec4dd71
Author: Mathieu Parent <math.parent at gmail.com>
Date: Sat Jan 12 16:43:03 2013 +0100
doc: allows to -> allows one to
Signed-off-by: Mathieu Parent <math.parent at gmail.com>
commit 506ecd186759675a1cf50a0a05a285fee03fc51e
Author: Mathieu Parent <math.parent at gmail.com>
Date: Sat Jan 12 15:14:48 2013 +0100
build: Add missing LDFLAGS
Original Author: Simon Ruderich <simon at ruderich.org>
Signed-off-by: Mathieu Parent <math.parent at gmail.com>
commit 0e651e9da0f1f3c836b4474612ab13d0ccd272d9
Author: Srikrishan Malik <srimalik at in.ibm.com>
Date: Wed Jan 9 16:11:39 2013 +0530
Changes for unobtrusive recovery and new method for health check.
Unobtrusive recovery: Ganesha will not be restarted on failovers.
Ganesha health: Use the counters in /var/lib/nfs/ganesha_local to track progress
instead of the null call which can timeout if the server is too busy.
Signed-off-by: Srikrishan Malik <srimalik at in.ibm.com>
Signed-off-by: Lance Russell <lancerus at us.ibm.com>
commit 7393e2b290f9879ff72d5c5a9ce933034129f0e8
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jan 9 16:22:39 2013 +1100
recoverd: Create recoverd monitoring timed events off recoverd context
This ensures that when shutting down CTDB, all the timed events
associated with monitoring recoverd are destroyed and recoverd
is not restarted.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 7d8546ee4353851f0543d0ca2c4c67cb0cc75aea
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Oct 29 14:56:10 2012 +1100
daemon: Protect against double free of callback state while shutting down
When CTDB is shut down and monitoring has been stopped, monitor_context
gets freed and all the callback states hanging off it. This includes
callback state for current_monitor, if the current monitor event has
not yet finished. As a result, when the shutdown event is called,
current_monitor->callback state is not NULL, but it's actually freed
and it's a dangling reference.
So before executing callback function and freeing callback state check
if ctdb->monitor->monitor_context is not NULL.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 746168df2e691058e601016110fae818c6a265c3
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Dec 4 15:05:44 2012 +1100
daemon: On shutdown, destroy timed events that check if recoverd is active
When CTDB is shutting down, recovery daemon is stopped, but the
event that checks if recovery daemon is still alive is not destroyed.
So recovery master is restarted during shutdown if CTDB daemon takes
longer to shutdown.
There are two processes that check if recovery daemon is working.
1. ctdb_check_recd() - which checks every 30 seconds if the recovery
daemon process exists.
2. ctdb_recd_ping_timeout() - which is triggered when recovery daemon
fails to ping CTDB daemon.
Both the events are periodic and need to be destroyed when shutting down.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 45d439a1ab093b420c27b1502ef109021833c7af
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Dec 18 12:52:39 2012 +1100
tests: Add a test for recovery of persistent databases
Ensure that RSN based recovery and __db_sequence_number__ based recovery
methods for persistent databases work correctly. They should not cause
corruption of the database.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit efaac27a9ed52ed0f436c7e194013fd06e8b02b3
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Dec 19 15:14:42 2012 +1100
tools/ctdb: Add setdbseqnum command to set __db_sequence_number__
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit ca6e7eccc90f2869c220231666bf284798342bce
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Dec 19 14:43:26 2012 +1100
tools/ctdb: Re-factor code to check if db exists given name or id
Most of the commands related to database operations can now use the
common code (db_exists()) to refer to database with either name or id.
In addition to return db_id for db_name, the function returns all the
flags set for the database.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit d23adec89b69e7c6f96c8e1417ef4ca4c9edc57e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Dec 17 14:46:14 2012 +1100
tools/ctdb: Add pdelete command to delete a record from persistent database
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 9a70a4d23d00f6cb996c061ba3dfb7c47b4f6a4f
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Dec 4 14:58:30 2012 +1100
daemon: Update the comment and remove redundant check in ctdb_start_transport()
ctdb_start_transport() is called just before "setup" event, when CTDB
is ready to process the requests. "startup" event happens much later
after a successful recovery.
Transport method ctdb->methods is successfully initialized before
ctdb_start_transport() is called. No need to check again.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 735ec99b99c7bb579851ce8293011aaf1dcc552a
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jan 8 16:49:56 2013 +1100
eventscripts: Fail the setup event if CTDB does not become ready
Currently it silently continues without attempting to set tunables.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 50abf597cefe6f8ea2a2ff7694bf84641344a9b1
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 4 13:52:01 2013 +1100
scripts: Make script_log() use supplied message, stop logger from hanging
When using syslog any provided message arguments are ignored and not
passed to logger. This means that logger blocks waiting on stdin.
That's bad.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e2aaa64925cca359c71520e01a18fc9461b0da4d
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 4 11:41:03 2013 +1100
scripts: Rework ctdb-crash-cleanup.sh so that it uses existing functions
This improves maintainability.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 03356fd5ae7a3ac35fde0289cbea7c71ecf07367
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jan 4 11:23:29 2013 +1100
scripts: Make drop_all_public_ips() more robust
Incorporate some of the logic from ctdb-crash-cleanup.sh that ensures
IPs are deleted even if they have the wrong netmask or are on the
wrong interface.
Factoring out some of the code will allow it to be used elsewhere.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 13e5e609b262847b607e7af7e0685f44e7cb8e36
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jan 3 16:02:52 2013 +1100
ctdbd: Default value for debug_hung_script should use ETCDIR
That is, it should use whatever was specified in ./configure and
should not hardcode /etc.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 8507303b525d20c74e8ec4e7c4f5f275945cd3b6
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jan 3 15:33:57 2013 +1100
scripts: debug-hung-script.sh doesn't need functions/loadconfig
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 376015ba5ad6b7703ae9949a1d40a0c72dfaba0c
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jan 3 15:33:10 2013 +1100
scripts: statd-callout should calculate CTDB_BASE if it is not set
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 740ea8ea5084149c8b552a01ee1c98c558b12384
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jan 3 15:26:12 2013 +1100
eventscripts: Each script should set CTDB_BASE if it is not set
This makes it easier to run the scripts externally.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b23c30253cc9eb274b895cac0f8c65245ba0a200
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jan 3 15:07:07 2013 +1100
scripts: Move drop_all_public_ips() to the functions file
... so it can be improved and used elsewhere.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 13a5944f8a27d43006acfffba76958693cae7702
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Oct 12 16:12:38 2012 +1100
tests/simple: Add test to check recovery daemon IP verification
Also update ips_are_on_nodeglob() to handle negation.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3cc596d2b459d834f9785b3a98027e46431ff2b9
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jan 8 10:21:49 2013 +1100
tests/eventscripts: Ratchet down debug level for ctdb_takeover_tests
The default IP allocation algorithm used by ctdb_takeover_tests
changed from "non-deterministic IPs" to "LCP2". The latter generates
a lot more debug output. ctdb_takeover_tests is used by the ctdb tool
stub to calculate IP address changes for failovers. This resulted in
unexpected debug output that caused tests to fail. Since eventscript
tests don't care how IP allocations are arrived at, the best solution
is to turn down the debug level.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 6a1d88a17321f7e1dc84b4823d5e7588516a6904
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Dec 14 17:12:01 2012 +1100
recoverd: Separate each IP allocation algorithm into its own function
This makes the code much more readable and maintainable.
As a side effect, fix a memory leak in LCP2.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 8adb255e62dbe60d1e983047acd7b9c941231d11
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Dec 13 13:23:32 2012 +1100
recoverd: New function unassign_unsuitable_ips()
Move the code into a new function so it can be called from a number of
places.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f6ce18d011dd9043b04256690d826deb2640cd89
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Dec 13 12:15:32 2012 +1100
recoverd: Move failback retry loop into basic_failback() and lcp2_failback()
The retry loop is currently in ctdb_takeover_run_core(). Pushing it
into each function will make it possible to put each algorithm into a
separate top-level function. This will make the code much clearer and
more maintainable.
Also keep associated test code compatible.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c09aeaecad7d3232b1c07bab826b96818756f5e0
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Dec 11 15:49:17 2012 +1100
recoverd: Trying to failback more IPs no longer allocates unassigned IPs
Neither basic_failback() nor lcp2_failback() unassign IPs anymore, so
there's no point looping back that far.
Also fix a unit test that now fails because looping back to handle
unassigned IPs is no longer logged.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 4dc08e37dec464c8785a2ddae15c7c69d3c81ac3
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Dec 11 15:43:36 2012 +1100
recoverd: basic_failback() can call find_takeover_node() directly
Instead of unassigning, looping back and depending on
basic_allocate_unassigned.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 4c87e7cb3fa2cf2e034fa8454364e0a7fe0c8f81
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Dec 11 15:01:12 2012 +1100
recoverd: Don't do failback at all when deterministic IPs are in use
This seems to be the right thing to do instead of calling into the
failback code and continually skipping the release of an IP.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e06476e07197b7327b8bdac9c0b2e7281798ffec
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Dec 14 17:10:41 2012 +1100
recoverd: Move the test for both 'DeterministicIPs' and 'NoIPFailback' set
If this is done earlier then some other logic can be improved. Also,
this should be a warning since no error condition is set.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit bcd5f587aff3ba536cb0b5ef00d2d802352bae25
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Dec 14 17:10:05 2012 +1100
recoverd: Fix a memory leak in IP allocation
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit edda58a45915494027785608126b5da7c98fee85
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Dec 20 16:27:27 2012 +1100
tests/takeover: Add some LCP2 tests for case when no node are healthy
3 tests should assign IPs to all nodes.
3 tests set NoIPTakeoverOnDisabled=1 and should drop all IPs.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5c820b2398a42af0e94bc524854a1ad144a63f7b
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Dec 20 16:26:42 2012 +1100
tests/takeover: Initial tests for deterministic IPs
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 98bd58a98d34ecca89c9042417d7527a18a5ecf9
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Dec 20 16:25:53 2012 +1100
tests/takeover: Do output filtering for deterministic IPs algorithm too
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d357d52dbd533444a4af6151d04ba119a1533068
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Dec 20 16:24:58 2012 +1100
tests/takeover: Support testing of NoIPTakeoverOnDisabled
Via $CTDB_SET_NoIPTakeoverOnDisabled.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 20631f5f29859920844dd8f410e24917aabd3dfd
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Dec 20 14:52:05 2012 +1100
tests/takeover: IP allocation now selected via $CTDB_IP_ALGORITHM
Default to LCP2, like ctdbd. Also support "det" for deterministic
IPs.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 06ad6b8a19f830472b0ed65cb52e7c3ea74ed1dc
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Dec 13 20:29:22 2012 +1100
tests/takeover: Support valgrinding the takeover code
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 1a5410e8349cdb96fdc51aa5ecd4f5734f6798a5
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Nov 30 16:38:08 2012 +1100
tests: new simple integration test for delip interface garbage collection
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 8164d9b29bf9080ccc76b1305fb6c07f1ed61d55
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Nov 30 16:37:28 2012 +1100
tests: new function ip2ipmask() for integration testing
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit cc1a3ae911d3fee8b87fda5de5ab6d9499d7510a
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Nov 23 20:09:07 2012 +1100
ctdbd: Clean up orphaned interfaces when an IP is deleted
Add a new function ctdb_remove_orphaned_ifaces() and call it in
ctdb_control_del_public_address().
ctdb_remove_orphaned_ifaces() uses a naive implementation that does
things in a very obvious way. There are many ways to improve the
performance - some are mentioned in a comment in the code. However, I
doubt that this will be a bottleneck even with a large number of
public IPs. Running the eventscript is likely to outweigh the cost of
this cleanup.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit b849fb4923d6a34141fe19006a974de81508ceda
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jan 7 12:00:34 2013 +1100
tests/complex: Add NFS test when CTDB is killed on one of the nodes
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit c75b5e5b4d000f5c7dab403df8238ceed390c1c0
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Dec 4 15:00:44 2012 +1100
Eventscripts: Change the default reconfigure action to do nothing
A default action of restarting the service doesn't obey the principle
of least surprise. It cause the NFS service to be implicitly
reintroduced.
This allows no-op functions to be removed from some eventscripts and
service restart functions to be added to others.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 2629de72e1f37b5e46772c2ef8d8d0012fc4ed37
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Dec 4 14:52:25 2012 +1100
Eventscripts: Do not restart NFS on reconfigure
It looks like this restart was accidentally reintroduced in commit
fc0678d351187cfa4c71123f97c0f493aacd5d16 when $service_reconfigure
became unset so the default action of restarting the service would
occur. From there cleanups have explicitly reintroduced it and
carried it through the code.
Also update the unit tests affected by this change.
The restart was originally removed in commit
bc481c3f1a44c50648488c4f8a7f15ec395d446f.
The default reconfigure action of restarting a service is clearly
suboptimal and will be addressed in a separate patch.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 2bbee8ac23ad5b7adf7122d8c91d5f0d54582507
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Dec 4 14:28:06 2012 +1100
ctdbd: Initialise the node flags in just one place
Currently flags are initialised in 2 places. One of them is in
ctdb_tcp_listen_automatic(), which just seems wrong. This makes the
code easier to follow by just doing it in ctdb_start_daemon().
This means that the flags are now initialised later than previously.
However, it is still done before the transport is started and before
clients can connect.
In future it might make sense to do a similar thing with setting the
PNN. However, the current optimisation is reasonably obvious...
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 496387a585b2c5778c808cf02b8e1435abde4c3e
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Dec 3 15:44:12 2012 +1100
ctdbd: Remove debug option --node-ip, use --listen instead
This effectively reverts d96cb02c2c24f9eabbc53d3d38e90dea49cff3e0
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 3221fce9ee2f6fdd3bb17a5e1629ad52a32f90d6
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Dec 3 15:32:49 2012 +1100
tests: Local daemons should use --listen instead of --node-ip
Signed-off-by: Martin Schwenke <martin at meltin.net>
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
commit 776590bf84d221092298346a28d7fc0552a67c9d
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Nov 30 12:59:35 2012 +1100
Initscript: when checking status, print output of "ctdb ping" if it fails
At the moment the caller has no idea why it thinks CTDB isn't running
and we can't debug failures...
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5067392d2e06795559f25828b65c129608b65c0b
Author: Michael Adam <obnox at samba.org>
Date: Tue Nov 20 11:20:34 2012 +0100
ctdb:recover: fix a comment typo
Signed-off-by: Michael Adam <obnox at samba.org>
commit 81788cfabe960497b050c5ee4e4e487ee061012a
Author: Michael Adam <obnox at samba.org>
Date: Fri Dec 21 11:52:57 2012 -0500
events/50.samba: fix testparm background update
creating the smb.conf cache with "-v" results in a cache file
that fails to load with "testparm -s ..." later on due to
"copy = " not being processable. (Copying the empty service name fails).
Signed-off-by: Michael Adam <obnox at samba.org>
commit 4a9e96ad3d8fc46da1cd44cd82309c1b54301eb7
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Jan 4 14:32:55 2013 +1100
daemon: Add a tunable to enable automatic database priority setting
Samba versions 3.6.x and older do not set the database priority.
This can cause deadlock between Samba and CTDB since the locking order
of database will be different. A hack was added for automatic promotion
of priority for specific databases to avoid deadlock. This code should
not be invoked with Samba version 4.x which correctly specifies the
priority for each database.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Michael Adam <obnox at samba.org>
commit f81e9add466b1d9b2796c09c6ba63b77296ea149
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Nov 30 12:21:30 2012 +1100
daemon: Check if log_latency_ms is set before using it
This fixes a bug where wrong variable is checked.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 905cd1293aa97dc7839a59b4f68eca02981f0891
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Nov 23 12:51:47 2012 +1100
Git should ignore generated include/version.h file
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 9a02f61547ddf74629aca21639d8fb61c1df7cbb
Author: Volker Lendecke <vl at samba.org>
Date: Thu Nov 22 15:27:51 2012 +0100
vacuum: Avoid some tallocs in ctdb recovery
In a heavily loaded and volatile database a lot of SCHEDULE_FOR_DELETION
requests can come in between fast vacuuming runs. This can lead to
significant ctdb cpu load due to the cost of doing talloc_free. This
reduces the number of objects a bit by coalescing the two objects
of delete_record_data into one. It will also avoid having to allocate
another talloc header for a SCHEDULE_FOR_DELETION key. Not the full fix
for this problem, but it might contribute a bit.
commit d05faf294e58e22ae3fbc76162258f1ae8178129
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Nov 21 17:03:37 2012 +1100
doc: Update ping_pong documentation to add -c option
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 4f42d17b74ce891691eee1cead498959cc8e4837
Author: Michael Adam <obnox at samba.org>
Date: Tue Nov 6 01:26:05 2012 +0100
utils:ping_pong: add a -c switch to check the lock before reading/writing
This is to verify that the fcntl F_GETLK call reports F_UNLCK if called
from a process already holding a lock. This is for example used by samba's
strict locking code in combination with "posix locking = true".
Signed-off-by: Michael Adam <obnox at samba.org>
commit 6860c79aea416f56cfd7a6af790bbdf495dbc54e
Author: Michael Adam <obnox at samba.org>
Date: Mon Nov 19 17:28:03 2012 +0100
recovery: data corruption of persistent DBs after recoveries: don't delete emtpy records
The record-by-record mode of recovery deletes empty records.
For persistent databases, this can lead to data corruption
by deleting records that should be there:
- Assume the cluster has been running for a while.
- A record R in a persistent database has been created and
deleted a couple of times, the last operation being deletion,
leaving an empty record with a high RSN, say 10.
- Now a node N is turned off.
- This leaves the local database copy of D on N with the empty
copy of R and RSN 10. On all other nodes, the recovery has deleted
the copy of record R.
- Now the record is created again while node N is turned off.
This creates R with RSN = 1 on all nodes except for N.
- Now node N is turned on again. The following recovery will chose
the older empty copy of R due to RSN 10 > RSN 1.
==> Hence the record is gone after the recovery.
On databases like Samba's registry, this can damage the higher-level
data structures built from the various tdb-level records.
This patch fixes that problem by not deleting empty records in recoveries
for persistent databases.
Signed-off-by: Michael Adam <obnox at samba.org>
commit 909269a4a3690e1245117ca1af935401455785e6
Author: Michael Adam <obnox at samba.org>
Date: Mon Nov 19 17:20:11 2012 +0100
recoverd: fix a comment typo
Signed-off-by: Michael Adam <obnox at samba.org>
commit bab744e3c49efef2e05dc09e8ea9bd3e3fa58716
Author: Michael Adam <obnox at samba.org>
Date: Fri Nov 16 14:33:41 2012 +0100
vacuum: fix a comment typo
Pair-Programmed-With: Volker Lendecke <vl at samba.org>
Signed-off-by: Michael Adam <obnox at samba.org>
commit d8f010355b715e49709836e057a5d0f110919897
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Nov 16 20:21:15 2012 +1100
Eventscripts: 10.interface should list configured interfaces
The current code lists available interfaces. If IPs are configured in
some other way than the public addresses file (e.g. ctdb addip) and their
interfaces default to being marked down then, since down interfaces are
not available, these interfaces can never be marked up.
The configured interfaces should be listed instead.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 9275a69a414482f1053ae14528d5972575b9214e
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Nov 16 19:43:14 2012 +1100
ctdbd: Make the link status of new interfaces more flexible
Neither up nor down is a good default value for the link status of a
new interface. Up means that IPs can be assigned to interfaces before
the true state is known and they can move away quickly if the interface
is actually down. Down means that IPs can't be assigned to an interface
for a variable amount of time - until a monitor cycle occurs - and this
can result in imbalanced IPs.
This is a neat compromise. Before the startup event completes, IPs
can't be assigned to interfaces because all interfaces begin in a down
state. As soon as the startup event completes, IPs can be allocated
to any interface that has been marked up by the eventscript. Later,
during normal operation, newly added IPs can be assigned to new
interfaces immediately. The IPs will still move away if an interface
is noticed to be down in the next monitor cycle, but that is the
exception rather than the rule.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 54e24a151d2163954e5a2a1c0f41a2b5c19ae44b
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Nov 14 15:51:59 2012 +1100
locking: Do not use RECLOCK for tracking DB locks and latencies
RECLOCK is for recovery lock in CTDB. Do not override the meaning for
tracking locks on databases. Database lock latency has nothing to do
with recovery lock latency.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 718233c445cd6627ab3962b6565c2655f1f8efd0
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Nov 6 17:06:54 2012 +1100
tools/ctdb: Do not use function return value as pnn
This fixes the wrong code where same variable 'ret' is used to track the pnn
and the return value of a function call.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit a5c6bb1fffb8dc3960af113957a1fd080cc7c245
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Oct 23 16:23:12 2012 +1100
recoverd: Track the nodes that fail takeover run and set culprit count
If any of the nodes fail takeover run (either due to timeout or failure
to complete within takeover_timeout interval) from main loop, recovery
master will give up trying takeover run with following message:
"Unable to setup public takeover addresses. Try again later"
And as a side-effect the monitoring is disabled on all the nodes. Before
ctdb_takeover_run() is called from main loop, monitoring get disabled via
startrecovery event. Since ctdb_takeover_run() fails, it never runs
recovered event and monitoring does not get re-enabled.
In main_loop, ctdb_takeover_run() is called with a takeover_fail_callback.
This callback will get called if any of the nodes fail in handling
takeip/releaseip/ipreallocated events in ctdb_takeover_run().
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit f243a916ee71013f7402b9c396c2ead88eb3aab0
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Nov 14 10:37:15 2012 +1100
Eventscripts: 10.interface startup event should only process interfaces once
Provided that monitor_interfaces() sets the state of each interface,
there's no need to mark all interfaces as up before running
monitor_interfaces() in the startup event. monitor_interfaces() will
set the true status of each interface anyway. The duplication is
unnecessary and may cause extra action in the recovery daemon because
the state of some interfaces is changed an extra time.
Instead, add a comment at the top of the loop in monitor_interfaces()
to warn against early loop exits.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5f58c811127a89f162b6a41ddcd6e944801740a5
Author: Volker Lendecke <vl at samba.org>
Date: Tue Nov 6 16:17:22 2012 +0100
build: Fix the build with old system-installed tevent
We depend on the tracing callback mechanism in ctdb.
commit cd64035d71ddff6aebe6c15a49e09527283425d2
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Oct 31 12:33:25 2012 +1100
ctdbd: Fix compilation warning in locking code
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ceac026713a7ee30ea865ed4a9422900ed76fdf6
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 31 12:17:27 2012 +1100
web: Update instructions for building from tarball
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit aad1584da8a8425bc6f5163c95810e9d2390dc91
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 31 12:10:22 2012 +1100
tests: Do not check release suffix in ctdb version test
release suffix added by RPM is to track packaging changes. Core CTDB
version does not include the release suffix.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 16a91c2a4d03b46743611e2fe844bb2cef95e46a
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Oct 30 11:54:52 2012 +1100
packaging: Use maketarball.sh script to create tarball for RPM
This removes the duplicate code for building tarball and reuses existing
script.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 3d4838db51dd8199b9c29aebb6e7bfbd2a27b8bb
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Oct 30 11:52:19 2012 +1100
packaging: Use optional argument as targetdir when creating tarball
In addition, do not modify CTDB version string with extra suffix.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit f8af7d8de76e68e5c4bde15f832a31ce9107e8c7
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Oct 30 11:49:28 2012 +1100
tool/ctdb: Always support ctdb version command, don't make it optional
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 8df7ea6b20417833792932487a082b3c71bb6837
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Oct 30 11:48:23 2012 +1100
build: Add rules to create include/version.h when building from git tree
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit b151f9b62299ec5b887c62cef780547a39c0ba9d
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Oct 30 11:47:24 2012 +1100
packaging: Create include/version.h to define CTDB_VERSION_STRING
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 9be3b23adbfc844b71bf1d4ddf0fbc3b269f15fa
Author: Volker Lendecke <vl at samba.org>
Date: Tue Oct 23 21:49:34 2012 +0200
Add a \n to an error message
commit e2213db479129ce9c2b2fb88ec8c53cbd33d54b3
Author: Volker Lendecke <vl at samba.org>
Date: Tue Oct 23 13:45:42 2012 +0200
Avoid a bashism in 60.ganesha
This file is #!/bin/sh. On sn-devel at least, with this /bin/sh the
shell does not like == for string equality.
commit e94070de52232d6cefae0c6276df88b8fc380a4e
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 24 12:58:57 2012 +1100
web: Update broken links to manpages
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 6871415f6cb50c4f9753067359f0e264d3f93871
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Oct 22 18:04:09 2012 +1100
packaging: Bundle README, COPYING and html version of manpages
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit f3888712298f1de7cc7eb51f50c22080fa64e3c0
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Oct 22 17:43:32 2012 +1100
doc: Do not keep the built version of manpages in version control
Generated docs will be bundled with release tarballs. No need to keep
them in git. This avoids the need to commit the generated doc version
if source xml is modified.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 0019291371af1e63ee132ed173ba7f52a0291a44
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Oct 22 15:12:50 2012 +1100
packaging: Use common code to generate VERSION string
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 19fb26346567d2249b1237f92d871022db2ba8cd
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Oct 22 15:08:41 2012 +1100
packaging: Factor out the code to genreate VERSION string
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 69f0473b72aadab5bd5791ccff2facd0cd469d43
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Oct 22 15:55:33 2012 +1100
packaging: Build docs and include them in tarball
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 3274cffe2052953b34141a82de6053b747532a88
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 17 10:09:26 2012 +1100
build: Extract building of manpages in a separate Makefile
This can then be used to build manpages/html when creating tarball.
Do not build docs during a regular build, but only for install.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit db987eeb3c6e10552a1c1334bf263eb66fcad9ab
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Oct 22 10:52:06 2012 +1100
doc: README - add information about CTDB, license and website
commit b3eac871895cc586bcc671835e882b136e466b98
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 17 11:27:32 2012 +1100
web: Add posix locking information to prerequisites
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 12e4a3e2953842b4c3842bf920fe2086df4fe46c
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 17 11:26:52 2012 +1100
web: Add the links to ftp/http ctdb download area
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 4250c7ebe369e73cf29ff910bb9bfc929735408c
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 17 11:25:46 2012 +1100
web: Remove reference to non-existent config files
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit c18ec8ec234cb71da6cc77b1aadc398f57187947
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Oct 22 12:19:07 2012 +1100
doc: getlog and clearlog changes for recovery daemon logs
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 7547e011005f0dd5bd38e67572280126cf16e229
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 18 14:15:09 2012 +1100
tests: Local daemons should use the logging ringbuffer
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 7197e600f46f2d1638f6c45c0149f109ea25a47c
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 18 14:13:30 2012 +1100
tools/ctdb: Merge recoverd log handling into getlog/clearlog
We don't need extra commands for these.
Also, allow a default value of NOTICE for the getlog level.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit ef55e06192819d840c09b65741bab737223ac34c
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 16 20:57:31 2012 +1100
tools/ctdb: Add log ringbuffer handling for recoverd
This adds commands rdgetlog and rdclearlog
These are analogous to getlog and clearlog but operate on the logs for
the recovery daemon.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit cbca233d1e03b2410e0bb63b936328d4a8b3c7b4
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 16 20:54:39 2012 +1100
recoverd: Add CTDB_SRVID_GETLOG and CTDB_SRVID_CLEARLOG
These support getting and clearing logs from the ring-buffer in the
recovery daemon.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a9511cf5ecd5bc39b0070f0afa8ac4d4926c6cab
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Oct 22 09:01:27 2012 +1100
build: Set CTDB_PATH to /tmp/ctdb.socket if SOCKPATH is not defined
When building samba with CTDB, if samba configure/waf does not support
setting of SOCKPATH, fallback to /tmp/ctdb.socket.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit f92b9c83a2f39fba9a141417a88de96fc8c592ff
Author: David Disseldorp <ddiss at samba.org>
Date: Thu Oct 18 16:55:19 2012 +0200
Build: Set the default ctdb socket path at configure time
The ctdb socket path currently defaults to /tmp/ctdb.socket and can be
modified at runtime using the --socket=filename option, common to both
ctdb and ctdbd binaries.
This change allows the default path to be set at configure time using
the --with-socketpath=FILE argument. When not specified, the default
path remains /tmp/ctdb.socket, documentation remains unchanged as a
result.
Signed-off-by: David Disseldorp <ddiss at samba.org>
commit 7d025281ee70c91ebcd4d9a908de1045a689786b
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Sep 25 17:29:50 2012 +1000
locking: Do not use ctdb_kill() to kill smbd processes
ctdb_kill() is used to terminate processes spawned by CTDB.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit edbc8a6669b594d3c413d603e1c9fada9244c2ee
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jul 11 15:15:41 2012 +1000
locking: Add database priority handling for older versions of samba
In samba versions 3.6.x and older, database priorities are not set.
later_db() function implements higher database priority (locking order)
for these databases -
brlock, g_lock, notify_onelevel, serverid, xattr_tdb
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit c8eb4a3170ab8524e638047053831ba547e9cce8
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Jul 9 17:37:35 2012 +1000
locking: Schedule a new lock request everytime a lock is released
Since the number of active lock requests is limited to
MAX_LOCK_PROCESSES_PER_DB (= 100), any new requests won't get scheduled
when they are created. So schedule a pending request once current active
request is done.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 2126795153dacb255e441abcb36ee05107b6282a
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jun 14 16:12:48 2012 +1000
ctdbd: Replace lockwait with locking API and remove ctdb_lockwait.c
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 4456a01d8f54ca6c771d7488048de5f638477d21
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 9 15:17:21 2012 +1000
ctdb_recover: Replace static locking functions with locking API
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 01ee86d2aafbcda658ef6acc2bba6d6781ae4047
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 9 15:09:51 2012 +1000
ctdb_freeze: Replace locking functions with locking API
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit caff197edf6f928494028ac6c993901954aaa36f
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 9 15:10:20 2012 +1000
ctdbd_test: Include ctdb_lock.c code for test stubs
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 1ee55c511b99e9f8a6fa4e34207267e953f09bae
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu May 17 15:25:46 2012 +1000
tests: Fix statistics test for new output lines from locking API
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit e24b5bf283736624b387b0364d7200212bb3054b
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 9 12:58:19 2012 +1000
tools/ctdb: Display the locking statistics
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 1af99cf0de9919dd89af1feab6d1bd18b95d82ff
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Oct 11 11:29:29 2012 +1100
ctdbd: locking: Provide non-blocking API for locking of TDB record/db/alldb
This introduces a consistent API for handling locks on single record, complete
db or all dbs. The locks are taken out in a child process. In cases of timeout,
find the processes that currently hold the lock and log.
Callback functions for locking requests take locked boolean to indicate
whether the lock was successfully obtained or not.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit be4051326b0c6a0fd301561af10fd15a0e90023b
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jun 6 11:50:25 2012 +1000
common: Add routines to get process and lock information
Currently these functions are implemented only for Linux.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit a0cdfae7438092f5c605f0608daa536be860b7fe
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed May 9 12:56:53 2012 +1000
header: Added DB statistics update macros
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 5ee242c949a98bb7397e0f7368b20d44c06fe772
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 16 17:04:48 2012 +1100
scripts: Refactor logging code in initscript and functions file
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 2d75a04ba9a2e87a0dcb9bf778c58e335af1871c
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 11 16:21:02 2012 +1100
tools/ctdb_diagnostics: Add "ctdb listvars" output
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 59a47c0674bacfebc17a1b44f0244727bf2fa7a4
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 11 16:18:26 2012 +1100
initscript: Check that rc.ctdb is executable before running it
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 440892d75ef73c0aca22f47c0c01712be00cf5b7
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 11 16:10:19 2012 +1100
ctdbd: Remove references to forcing running of eventscripts from log messages
Running of eventscripts can be initiated from many places, including
the recovery daemon.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 14589bf7c16ba017fe00d4e8bea8cc501546c60f
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 11 15:59:00 2012 +1100
recoverd: Clarify some misleading log messages
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 59520c9785d113ad5063eb5fbe42a9efc7e30076
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 11 15:49:13 2012 +1100
tools/ctdb: Remove extra header from natgwlist -Y output
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3cc878bc97fdac764a60ed805f64d649eaab06e8
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 11 15:17:54 2012 +1100
recoverd: Verifying local IPs should only check for unhosted available IPs
Currently it checks for unhosted IPs among the known IPs rather than
available IPs. This means that a takeover run can be flagged even
when that takeover run will be unable to assign a known, unhosted IP.
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 16aba4eb620844626a1c71c58b51658caf44dea6
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Oct 11 14:34:37 2012 +1100
Revert "Eventscripts - add facility to 10.interface to delete unmanaged IPs"
This reverts commit 88f88d86b0d08240f749fb721b8c401c2eeb1099.
This is dangerous and, on reflection, I can't see it being useful.
There are often permanent IPs on interfaces that CTDB shares with its
public IPs.
commit eaa7c165f58abd7e259c37d76b7dd37c91e13d9f
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Sep 26 14:37:49 2012 +1000
Eventscripts: "recovered" event should not fail on NATGW failure
The recovery process has no protection against the "recovered" event
failing, so this can cause a recovery loop.
Instead of failing the "recovered" event, add a "monitor" event and
fail that instead. In this case the failure semantics are well
defined.
A separate patch should ban nodes if the "recovered" event fails for
an unknown reason.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0e56e2dad1861892aa8ba59494ad244f2498314e
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Sep 28 09:39:12 2012 +1000
Logging: Map TEVENT_DEBUG_FATAL to DEBUG_CRIT
This is currently mapped to DEBUG_EMERG. CTDB really has no business
logging anything at EMERG level since the whole system is not about to
abort or catch fire. EMERG causes the message to appear on the
console and on every terminal. That's a bit overzealous!
There would be very few situations where logs are being filtered at
level below ERROR, so CRIT should certainly suffice.
The trigger for this was curious messages saying "No event for <n>
seconds!" logged in a user's terminal.
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 7895bc003f087ab2f3181df3c464386f59bfcc39
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Sep 6 20:22:38 2012 +1000
common: Debug ctdb_addr_to_str() using new function ctdb_external_trace()
We've seen this function report "Unknown family, 0" and then CTDB
disappeared without a trace. If we can reproduce it then this might
help us to debug it.
The idea is that you do something like the following in /etc/sysconfig/ctdb:
export CTDB_EXTERNAL_TRACE="/etc/ctdb/config/gcore_trace.sh"
When we hit this error than we call out to gcore to get a core file so
we can do forensics. This might block CTDB for a few seconds.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit af540ef728303b4a0a188b17c695e9aefab34489
Author: Michael Adam <obnox at samba.org>
Date: Wed Oct 17 14:21:33 2012 +0200
config/functions: fix a comment
ctdb_check_counter_limits does not fail but succeed if count >= limit
Signed-off-by: Michael Adam <obnox at samba.org>
commit 25d886060b138bc5e78fe93d7bebe3990264f29d
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 17 11:38:37 2012 +1100
doc: Add info about execute permissions on event scripts
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 36d25e96a2f8ae1461c5a708a2922f0475a39900
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 17 11:38:59 2012 +1100
doc: Fix documentation for setup event
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 632c1b9c1cc2e242376358ce49fd2022b3f27aa2
Author: Amitay Isaacs <amitay at gmail.com>
Date: Mon Sep 3 12:39:36 2012 +1000
scripts: Remove duplicate code from init script to set tunables
The tunable variables defined in CTDB configuration file are currently
set up from init script as well as part of "setup" event in 00.ctdb
eventscript. Remove the duplication of this code and set tunable
variables only from setup event. During the "setup" event, it's possible
that ctdb tool commands can timeout if CTDB daemon is not ready. To guard
against such eventuality, wait till "ctdb ping" command succeeds before
executing any other ctdb tool commands.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 08dbd9c7958f9a0ee3de314d49523d32e4be135c
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Oct 17 11:24:57 2012 +1100
doc: Fix the hyperlink for "Testing CTDB" page
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit bd4ff176387372b1c233373c0bc8ced523fc9670
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Oct 10 15:03:06 2012 +1100
tests/eventscripts: add unit tests for policy routing reconfigure
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 7d4b8cce96f33fff647a0c9d259c121dfc8403e9
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Oct 10 14:48:59 2012 +1100
tests/eventscripts: add extra infrastructure for policy routing tests
Less copying and pasting is a good thing...
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c185ffd2822fcee26d07398464c59b66c61f53fa
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 3 10:54:30 2012 +1000
Eventscripts: Add support for "reconfigure" pseudo-event for policy routing
This rebuilds all policy routes and can be used if the configuration
changes.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 9550c497e6d6ef5ee44826c4bd9ed5ad65174263
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Sep 24 14:32:04 2012 +1000
recoverd: Track failure of "recovered" event, banning culprits
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 56fcee3c7730cb12fa666072d5400949af6e5f7c
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Aug 31 09:34:17 2012 +1000
recoverd: When starting a takeover run disable IP verification
Disable for TakeoverTimeout seconds.
Otherwise the the recovery daemon can get overzealous and start trying
to add/delete addresses that it thinks are missing but where the
eventscript just hasn't finished. This didn't used to matter so much
but it is more important now that concurrent takeip/releaseip/updateip
generate error - we want to avoid spamming the log.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit bfe16cf69bf2eee93c0d831f76d88bba0c2b96c2
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 11 14:46:07 2012 +1000
ctdbd: Stop takeovers and releases from colliding in mid-air
There's a race here where release and takeover events for an IP can
run at the same time. For example, a "ctdb deleteip" and a takeover
initiated by the recovery daemon. The timeline is as follows:
1. The release code registers a callback to update the VNN. The
callback is executed *after* the eventscripts run the releaseip
event.
2. The release code calls the eventscripts for the releaseip event,
removing IP from its interface.
The takeover code "updates" the VNN saying that IP is on some
iface.... even if/though the address is already there.
3. The release callback runs, removing the iface associated with IP in
the VNN.
The takeover code calls the eventscripts for the takeip event,
adding IP to an interface.
As a result, CTDB doesn't think it should be hosting IP but IP is on
an interface. The recovery daemon fixes this later... but it
shouldn't happen.
This patch can cause some additional noise in the logs:
Release of IP 10.0.2.133/24 on interface eth2 node:2
recoverd:We are still serving a public address '10.0.2.133' that we should not be serving. Removing it.
Release of IP 10.0.2.133/24 rejected update for this IP already in flight
recoverd:client/ctdb_client.c:2455 ctdb_control for release_ip failed
recoverd:Failed to release local ip address
In this case the node has started releasing an IP when the recovery
daemon notices the addresses is still hosted and initiates another
release. This noise is harmless but annoying.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a555940fb5c914b7581667a05153256ad7d17774
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Aug 28 15:17:29 2012 +1000
ctdbd: New tunable NoIPTakeoverOnDisabled
Stops the behaviour where unhealthy nodes can host IPs when there are
no healthy nodes. Set this to 1 when an immediate complete outage is
preferred when all nodes are unhealthy. The alternative
(i.e. default) can lead to undefined behaviour when the shared
filesystem is unavailable.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit be4ad110ede9981b181ac28f31ffd855a879d5df
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Aug 21 15:52:03 2012 +1000
Eventscripts: Add service-start and service-stop pseudo-events
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 7054e4ded59c6b8f254dcfefaef64da05f25aecd
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Aug 15 15:28:14 2012 +1000
ctdbd: Avoid unnecessary updateip event
The existing code makes one fatally bad assumption:
vnn->iface->references can never be -1 (or max-unit32_t in this case).
Right now the reference counting is broken so a reference count of -1
is possible and causes a spurious updateip when vnn->iface is the same
as best_face. This can occur frequently because we get a lot of
redundant takeovers, especially when each IP can only be hosted on one
interface.
This makes the code much more defensive by noting that when best_iface
is the same as vnn->iface there is never a need for an updateip event.
This effectively neuters the updateip code path when IPs can only be
hosted by a single interface.
This should obsolete 6a74515f0a1e24d97cee3ba05d89133aac7ad2b7.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c4f5a58471b206e2287c7958c7f29c1f1c0626ac
Author: Volker Lendecke <vl at samba.org>
Date: Tue Oct 9 11:39:58 2012 +0200
Correct include for ctdb_protocol.h
With an old ctdb_protocol.h installed under /usr/local, ctdb will
not compile because the <> form of include will find the header
under /usr/local
commit 06dfd13604d08910e07cbf927c338d7b9fce9a2f
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Sep 20 17:10:34 2012 +1000
Revert "when creating/adding a public ip, set the initial interface to be the first interface specified"
This reverts commit 4308935ba48ac7a29e7523315acf580019715f0f.
This fixes 16_ctdb_config_add_ip.sh test when run against local daemons. When
running against local daemons, if the interface is assigned as soon as an IP is
added, then takeover would never assign this IP address.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 212298279557a2833ef0f81809b4a5cdac72ca02
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Oct 2 11:51:24 2012 +1000
util: ctdb_fork() closes all sockets opened by the main daemon
Do some other hosuekeeping including stopping tevent.
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3a3dae4cb5ec8b4b8381a4013adda25b87641f3a
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Sep 3 15:37:01 2012 +1000
eventscripts: Auto-start/stop services in background
If $CTDB_SERVICE_AUTOSTARTSTOP="yes" then service start/stop is done
in the background with logging.
Fix some unit tests for samba and winbind.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 34535ae64420926b9a3bf7d453fed4e6f4c90115
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Aug 16 14:41:11 2012 +1000
Eventscripts: split 50.samba into 49.winbind and 50.samba
winbind and samba can be separately managed. This makes the service
starting and stopping code way too complicated, and even adds a small
amount of complexity to the monitoring code. The sensible option is
to split this eventscript in two.
There are two potentially backward incompatible changes here:
* Functionality has been removed that allowed 50.samba to manage
winbind when CTDB_MANAGES_WINBIND was unset but the smb.conf
"security" parameter was set to "ADS" or "DOMAIN".
Maintaining this functionality would have required moving the
testparm-related code to the functions file, deciding where the
cache file should go, and then calling it from both 49.winbind and
50.samba. This feature wasn't of great value and asking
administrators to set an extra variable in exchange for code
simplicity seems like a reasonable deal.
* External code will need to be changed if it calls 50.samba directly
with winbind-related expectations. This is fairly obvious!
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 043ef77086797a703aec436a26a05c56a1bcbf2b
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Aug 21 14:28:37 2012 +1000
Initscript: Kill any existing ctdbd processes if the ping succeeds
Initialising a new ctdbd will destroy the Unix domain socket so
existing processes will be useless anyway.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit dc2a8c638bd74b9f1dd75339cd2ae2f32ffa18a8
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Aug 20 15:02:24 2012 +1000
tools/ctdb: Free the event context
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b89e959904d7d1b0e5525abd7789f5101537a46a
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Aug 20 14:30:35 2012 +1000
libctdb: Add comments to effect that some controls return result in status
These controls include:
CTDB_CONTROL_GET_RECMODE
CTDB_CONTROL_GET_RECMASTER
CTDB_CONTROL_GET_PID
CTDB_CONTROL_GET_PNN
CTDB_CONTROL_PING
CTDB_CONTROL_GET_DB_PRIORITY
In these cases the data field is empty.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 6bd4feff7039138d435428eeded51975c44e567c
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 18 17:05:03 2012 +1000
tests/tool: New tests for natgwlist, getcapabilities, lvs, lvsmaster
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0f0aef21a1bb2d88a8c184ef70c718e0c91acdc3
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 18 17:02:38 2012 +1000
tests/tool: New function setup_natgw() to setup $CTDB_NATGW_NODES
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a56ec75edd1705b0539513d396d311f0e80a3bf5
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 18 16:59:19 2012 +1000
tools/ctdb: Clean up control_natgw()
* Factor out repeated code into new function find_natgw()
* Support both machine and human readable output
* Use libctdb
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c30ec02615183ecf9b412ad415bf1abd859aec45
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 18 16:57:01 2012 +1000
tools/ctdb: Convert some commands over to libctdb
control_getcapabilities(), control_lvs(), control_lvsmaster() updated
to use ctdb_getcapabilities(), ctdb_getnodemap() as appropriate.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 81af67c6959fdbe0566e3f1a00e2be58dd268dc6
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 18 15:57:13 2012 +1000
tests: libctdb stubs initial ctdb_getcapabilities() implementation
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a3f15d2828325bbfba5bc5c0a30429e2ce572a44
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 18 15:53:39 2012 +1000
tests: libctdb stubs must copy pointers rather than just returning them
Some code (e.g. NAT gateway code) modifies the returned result so was
modifying the original.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 140fafef23050d40d66f5b5558c7efcb78f80cd2
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 18 14:24:08 2012 +1000
libctdb: add ctdb_getcapabilities()
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 7b75a3bb722dc86139b1a07a0100d08c34620b91
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 21:25:27 2012 +1000
tools/ctdb: Remove redundant filtering loop in control_natgwlist()
This used to catch trailing blank lines. However, these are caught
just as effectively by the whitespace filtering in the loop below.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b29d5bbaa7048291c4b3a39bf12e04f0436f67da
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 21:15:57 2012 +1000
tools/ctdb: natgwlist output is either human readable or machine readable
The first line is currently human readable and the rest is machine
readable. This doesn't make sense. Do one or the other...
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 12a0a7a208d1c8fa8991894200d1dc133f3a2d1a
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 21:09:46 2012 +1000
tools/ctdb: Factor out printing of the machine readable status header
It is already in 2 places and we might use it in another.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 2da7730dc06153173778ab14e228960e72ff8a86
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 16 14:24:39 2012 +1000
tools/ctdb: NAT gateway code should use CTDB_NATGW_NODES
... not NATGW_NODES.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 93c97c3ba3ff714dfa0d056a91ff45010a6e2d66
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 20:46:58 2012 +1000
tests/eventscripts: New policy routing test with invalid table ID
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit acdaa04079a9827885f32a7bc078d3365c89b474
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 20:45:23 2012 +1000
tests/eventscripts: Modify ip stub to simulate invalid table ID
This involves refactoring ip_route_check_table() into a new function
ip_check_table() which tables the operation type (i.e. rule/route) as
an argument.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5c3be8f26dcde0b1b3d86928953e74d4a8b35958
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 20:19:37 2012 +1000
Eventscripts: Indent error when a route delete fails in 11.per_ip_routing
This puts it under the umbrella of the previous warning that should
also have been printed.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 6d41208074f0e9b56c585bca7eb39aaed653c4ca
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jun 19 17:20:18 2012 +1000
tests/eventscript: unit test for 13.per_ip_routing bogus route removal
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d0d0a6f19960f233224970b8d5d19b0e37222616
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jun 15 17:22:02 2012 +1000
eventscripts: 13.per_ip_routing should remove bogus routes on ipreallocated
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0ce5b079f327aba55b62800ccb22d79976fac665
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jun 13 13:53:18 2012 +1000
tests/eventscripts: Add a policy routing unit test for "ip rule del" failure
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 30d69defa7e97ab5e3ba0492a27868dde2616494
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jun 13 13:49:49 2012 +1000
eventscripts: Print a warning on failure to delete a routing rule
del_routing_for_ip() currently fails silently, which could hide real
errors.
In add_routing_for_ip() we don't want to see any error when calling
del_routing_for_ip(), since we don't expect the rule to be there.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 49dd755fcd077c84eaf3d2fe5dd7757f5588d49c
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Aug 17 13:06:12 2012 +1000
doc: Fix path string of /etc/sysconfig/ctdb file
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit fc18188b7b63eb0dafbc47e3abf80e306e1dfc31
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jul 6 20:43:46 2012 +1000
recoverd: All inactive nodes should yield recovery master role
Not just stopped nodes. In reality, this means that banned nodes will
also yield, since nodes in the other inactive states won't be running
a daemon.
This seems sensible since if another node notices that an inactive
node is the recovery master then it will force an election anyway.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e7dc10da3ced54ea9d719ad167ee42dcca8dce75
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jul 6 20:36:48 2012 +1000
recoverd: An inactive node should not force recovery master elections
An inactive node can't become the recovery master. So if an inactive
node notices that the recovery master is inactive, it shouldn't force
an election for recovery master and nominate itself as a candidate.
This can cause the recovery master to flip-flop between nodes when all
nodes are inactive.
If there is actually an active node then it will trigger the election.
This is fairly cosmetic but is a step along the way towards ironing
out weirdness when all nodes are stopped.
Also, fix a related comment.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit a0c30c820fd47d4f8620dc060c825be10754f5d1
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 3 10:30:29 2012 +1000
recoverd: main_loop() should not verify local IPs if node is stopped
Doing these checks is pointless and potentially causes unnecessary log
messages.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f586e8a2911fc6e7f6698f516653145d8fd45dad
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 3 10:15:25 2012 +1000
recoverd: verify_local_ip_allocation() should dup ifaces before early return
If CTDB starts in STOPPED state then it thinks it is in the middle of
a recovery. rec->ifaces is also NULL and an early exit further down
(that checks to see if a recovery is in process) means that it stays
that way.
However, each time this function is entered the need for a takeover
run is re-flagged. The takeover run never happens due to the the
early exit, causing a couple of unneeded messages to be logged each
time.
This is avoided by moving the code that sets rec->ifaces so that it is
executed earlier and, in this case, in the middle of a recovery.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit cc9d96f4248e45ea99c5f00db1526426ac26fbc2
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 2 17:26:04 2012 +1000
recoverd: Update a log message that has bit-rotted
This message used to be correct because the ipreallocated event only
handled updating the NAT gateway. However, that has changed so the
message needs to be updated.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 9119a568c2b4601318f7751f537dca2f92a7230b
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jun 22 14:01:02 2012 +1000
recoverd: Fix bogus info in message about changed flags
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c29a943f9bbcfecb861e71d007c7698a53dc8773
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 30 12:51:43 2012 +1000
tests/eventscripts: Extra cases for policy routing missing config test
Test the startup and monitor events too.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c64c6c77c3f6aa2898e5a575547b587bea868c76
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 30 12:51:12 2012 +1000
Eventscripts: 13.per_ip_routing should always fail if config is missing
Currently, if the configuration file is specified by
$CTDB_PER_IP_ROUTING_CONF but is missing, takeip fails but (the
absent) monitor event "succeeds", so the state of a node will
flip-flop.
Instead of this, if the configuration file is missing then fail early
on for all events.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5277d749c9111716fd723647d5421907476422bf
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 30 11:50:53 2012 +1000
Revert "Eventscripts - make 13.per_ip_routing fail gracefully if config is missing"
When the configuration file is missing this causes the node to
flip-flop betwen unhealthy (when takeip fails) and healthy (no monitor
event here).
Will reimplement this properly.
This reverts commit 351ca413eec460330571ca8b01ad269728fe15df.
commit 076282622fcb2663d378e0c90ed0d9c19f73c005
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jul 6 20:35:23 2012 +1000
ctdb tool: recmaster command might as well be auto-all
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit fa0f3cba5adaa38bed37dd8b121ad53e962a010d
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 16:52:04 2012 +1000
doc: Document the new onnode -P option
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit aed9b98ddbbf3e81de4f7257a10676565f7d7507
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 16:45:55 2012 +1000
tools/onnode: Add -P option to push files to given nodes
A list of files is given rather than a command. These files are
pushed to the specified nodes.
Quoting is fragile/broken so filenames with spaces won't work - you
win some, you lose some. :-)
All of the other onnode options should work together with this option.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 96fdda124f5511fb76190e7c7a7f0b98e6b01a31
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 20:13:45 2012 +1000
Eventscripts: Clean up 11.routing
The loops can all be done without cat or grep.
The pair of loops in updateip is combined into a single loop.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 553455b386aa7848a516a921dfc14eb87c8a3fc1
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 4 07:21:01 2012 +1000
ctdbd: Log a meaningful message if the nodes file/list is empty
Right now the message says it can't bind to any of the
addresses... even when there aren't any!
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3880589db4d563e438126cf5080261fa06b9e242
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 2 17:15:42 2012 +1000
ctdbd: Remove the worked "Forced" from message about running eventscripts
The eventscripts are run after a takeover run and in this case they're
not forced. The messages seems to imply that somone has run "ctdb
eventscript" when that is not necessarily the case.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 38e8651b955afdbaf0ae87c24c55c052f8209290
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 2 14:09:32 2012 +1000
ctdbd: Fix ctdb_control_release_ip() on local daemons
When running on local daemons no IPs are actually assigned to
interfaces. Commit 9a806dec8687e2ec08a308853b61af6aed5e5d1e broke
ctdb_control_release_ip() for local daemons because it asks the system
which interface the given IP is on, instead of the old behaviour of
trusting CTDB's internal records.
For local deamons (i.e. !ctdb->do_checkpublicip) revert to the old
behaviour of looking up the interface internally. This is good
enough, given that the tests don't tend to misconfigure the addresses.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5b2725d1ae052e848c2487cb10c5393a877d118c
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 15:45:45 2012 +1000
Initscript: clean up drop_all_public_ips()
This makes the case implicit where $CTDB_PUBLIC_ADDRESSES is unset.
This is OK because that's not an interesting code path.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 6616a5712b5d4db2b9ba6a88cec79378696c2184
Author: Martin Schwenke <martin at meltin.net>
Date: Fri Jul 20 17:00:12 2012 +1000
tests/tool: Run ctdb_tool_* under $VALGRIND
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 7ef9916bd95ff2472359a412eac5489f1aad2dce
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jul 4 07:29:18 2012 +1000
tests/eventscripts: Rewrite the testparm stub
It currently needs the real testparm command installed even though it
only uses limited features. It is easy enough to fake up the
functionality that 50.samba uses.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 3f268805c14c51f23024267916eae161bada8a0e
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 3 13:05:58 2012 +1000
tests/complex: Fix broken ctdb_test_check_real_cluster()
It doesn't set $h at all...
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 8d17dacee415dd0b4268805a366a86f83e33f27c
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 2 14:18:51 2012 +1000
tests/simple: ctdb stop/continue tests weren't actually checking IPs
The correct variable is $test_node_ips, not $ips.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 2fd0157382b42aa5c5212b8e743c6f589edc6662
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 2 14:06:35 2012 +1000
tests: select_test_node_and_ips() should try to avoid failing
Sometimes "ctdb sync" doesn't do its job, so we end up with unassigned
IPs.
If $test_node isn't set then this is bad. However, try a few times to
ensure it is set.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 47180dc75d15f3d61470705603565b718491c9f8
Author: Martin Schwenke <martin at meltin.net>
Date: Mon Jul 2 14:05:21 2012 +1000
tests: simple tests against local daemons should check $TEST_LOCAL_DEAMONS
Note the old $CTDB_TEST_REAL_CLUSTER - it doesn't exist anymore...
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 619af3e857c2ced3840abfd86135cc954796da97
Author: Martin Schwenke <martin at meltin.net>
Date: Wed Jun 20 15:57:48 2012 +1000
tests: run_tests should exit with $status with -e option
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 6e7bd9685406ae024d413a5d9d8c6e0d89b15567
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jun 14 19:37:39 2012 +1000
tests/simple: ctdb reloadips test should use $test_ip
There's no point recalculating this value.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f02e501342112aab67aee95f253e29a670b29273
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jun 14 19:36:04 2012 +1000
tests: select_test_node_and_ips() should never select non-node -1
Instead of selecting the 1st pnn found, select the 1st one that isn't -1.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 21a5cbf9518fafc610939f14874371a52b1dc8b3
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Jul 26 22:01:50 2012 +1000
util: Do not lock down memory when running with local daemons
Thanks to Ronnie for highlighting the issue of memory lockdown on AIX.
Fix typo, use getuid and not getpid.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 25d45e69f4ffc2b26061ac13038d52a353e79e61
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jul 5 16:27:54 2012 +1000
statd-callout: Fix a bug in the calculations of $STATE
It is just meant to be even, so divided *and* multiplied by 2. Use
$(( )) to make it more readable.
While touching this code, make the related calculation a bit more
readable too.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 624f4677e99ed1710a0ace76201150349b1a0335
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 24 11:23:09 2012 +1000
Eventscripts: Default route on NAT gateway should have a metric of 10
At the moment routes from 11.routing can fail to be added because they
conflict with the default route added by 11.natgw.
NAT gateway is meant to be a last resort, so routes from 11.routing
should override it.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 5d713d5e5be67f5914a661694c15d938bd67dea3
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 20:10:11 2012 +1000
Eventscripts: Update/remove stale comments in 11.natgw
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 630cfe6451ba23d959fa4907fbba42702337ed3b
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 15:39:50 2012 +1000
Eventscripts: Retrieve and build NAT gateway details better in 11.natgw
* "ctdb natgw" is run twice when it doesn't need to be.
* Tweak the parsing of "ctdb natgw" output so that it is done by the
shell instead of a bunch of external processes.
* Make default NAT gateway be -1, even on error. If the process
failed entirely then it could previously be empty.
* Streamline the error handling using die() for when there is no NAT
gateway.
* Downcase script-local variable names.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 34f58a0773618c4508a55ad75fc4602dad5a5f4c
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 15:37:14 2012 +1000
Eventscripts: Optimise building the host address in 11.natgw
It can be build without forking unnecessary processes.
Also downcase variable name because it is local to script.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit f6e421e8bf935cae790a6dc2b861eb9c7f8610b4
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 15:32:38 2012 +1000
Eventscripts: Clean up startup sanity check in 11.natgw
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 07149edaecb3caa672163e5a3b89715557d5205a
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 15:26:16 2012 +1000
Eventscripts: remove redundant firewall rules from 11.natgw
aeb70c7e7822854eb87873a5c7783e27e6e72318 said it moved these but it
redundantly duplicated them instead. That commit also fixed the
problem because it moved the rules after delete_all() not out of the
startup event as claimed.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit e20fdb974158061f4627d6f360c168d764690e6f
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jul 17 15:21:10 2012 +1000
Eventscripts: 11.natgw $CTDB_NATGW_PUBLIC_IP splitting optimisation
$CTDB_NATGW_PUBLIC_IP can be split into $_ip and $_maskbits without
forking lots of processes.
Also "local" isn't supported by POSIX.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit b3e798f357606648f04d8a67ffee775b34fdede7
Author: Amitay Isaacs <amitay at gmail.com>
Date: Tue Jul 24 17:27:22 2012 +1000
web: Add my name to the developer list.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 538c68d0e83e14f0000981ee06408b8f0035be37
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Jun 15 11:05:00 2012 +1000
Remove tevent_loop_allow_nesting()
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 3de2830ae68241ee95bcc14dc1bb896ff18d86ce
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jun 6 16:19:10 2012 +1000
ctdbd: Return explicit boolean values for function returning bool
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 25f84797a64a683c303b04057aa8113e9fc47c49
Author: Amitay Isaacs <amitay at gmail.com>
Date: Wed Jun 6 16:16:15 2012 +1000
util: Do not try to lockdown memory when running in local daemons mode
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit d29e1880c8ce7219e065d31b47b0e8ad9e83146d
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri Jun 15 15:07:04 2012 +1000
Fix compiler warnings.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit a0a0f5588445aeabe07b0e4d65087db454dc09da
Author: Michael Adam <obnox at samba.org>
Date: Tue Jul 3 11:50:05 2012 +0200
run_tests: improve spacing
commit 0e515115b3c21cb179fd7a6356164ac1b5d423e0
Author: Michael Adam <obnox at samba.org>
Date: Tue Jul 3 11:46:26 2012 +0200
run_tests.sh: fix a comment
commit 85a367005bd669309bb7e532b60d27621110180d
Author: Michael Adam <obnox at samba.org>
Date: Tue Jul 3 14:28:36 2012 +0200
ctdb: use correct "persistent" state for ctdb_attach in "ctdb cattdb"
Originally, "ctdb cattdb" attached explicitly as non-persistent, which
is now forbidden for persistent databases by the server.
Pair-Programmed-With: Gregor Beck <gbeck at sernet.de>
commit 1ebbaa620b3cfb9ff373828e4aaa84246cf3ec25
Author: Gregor Beck <gbeck at sernet.de>
Date: Thu Jun 21 10:26:03 2012 +0200
ctdbd: refuse attaching with "persistent" to a non-persistent db and v.v.
Signed-off-by: Michael Adam <obnox at samba.org>
commit 9a806dec8687e2ec08a308853b61af6aed5e5d1e
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Wed Jun 20 15:10:05 2012 +1000
When we find an ip we shouldnt host, just release it
Dont call a full blown clusterwide ipreallocation, just release it locally
commit c6bf22ba5c01001b7febed73dd16a03bd3fd2bed
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Wed Jun 20 10:08:11 2012 +1000
When we release an ip, get the interface name from the kernel
instead of using the interface where ctdb thinks the ip is hosted at.
The difference is that this now allows us to handle cases where we want to release an ip but ctdbd does not know which interface the ip is assigned on.
(user has used 'ip addr add...' and manually assigned an ip to the wrong interface)
commit f07376309e70f5ccdb7de8453caacc71b451ab48
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Wed Jun 20 13:32:02 2012 +1000
Add new command to find which interface is located on
commit 8307c70ed98996b430c470e9641a09fdeeb81bd8
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Wed Jun 13 16:17:18 2012 +1000
STATISTICS: Add tracking of the 10 hottest keys per database measured in hopcount
and add mechanisms to dump it using the ctdb dbstatistics command
commit 98e1b46adba11b9549b5c5976e1f561fe732fa6e
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jun 7 15:08:15 2012 +1000
Reimplement logging of long running events
Reimplement 5aba53e6adcfcd7edbdac9e30aa5fcba176aca00 using tevent
trace points.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 0dc204988eadff214dd149a756d756ab6e96e410
Author: Stefan Metzmacher <metze at samba.org>
Date: Fri Jun 8 12:50:21 2012 +0200
tevent: change version to 0.9.16
This adds tevent_*_trace_*() and tevent_context_init_ops()
metze
Autobuild-User(master): Stefan Metzmacher <metze at samba.org>
Autobuild-Date(master): Fri Jun 8 20:47:41 CEST 2012 on sn-devel-104
commit 7ebc00dc6a89043a971a720e7c21baf5f2a0233d
Author: Stefan Metzmacher <metze at samba.org>
Date: Fri May 11 15:19:55 2012 +0200
tevent: expose tevent_context_init_ops
This can be used to implement wrapper backends,
while passing a private pointer to the backens init function
via ev->additional_data.
metze
commit cb2bbe93628c1ab932c2e1ad6e2ec199a98f74c6
Author: Martin Schwenke <martin at meltin.net>
Date: Tue Jun 5 16:00:07 2012 +1000
lib/tevent: Add trace point callback
Set/get a single callback function to be invoked at various trace
points. Define "before wait" and "after wait" trace points - more
trace points can be added later if required.
CTDB wants this to log long waits and events.
Pair-programmed-with: Amitay Isaacs <amitay at gmail.com>
Signed-off-by: Martin Schwenke <martin at meltin.net>
Signed-off-by: Stefan Metzmacher <metze at samba.org>
commit 88040778aace229d724de1ba7556aded12e22f86
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jun 7 14:20:13 2012 +1000
Revert "TEVENT: Add back tracking of long runnig events to the local copy of tevent library"
This reverts commit 5aba53e6adcfcd7edbdac9e30aa5fcba176aca00.
Do this using new tevent trace point callback.
commit e0c9200c05b1f7a04e002f505ebb5ba9340c0ca1
Author: Martin Schwenke <martin at meltin.net>
Date: Thu Jun 7 12:26:02 2012 +1000
lib/tevent: In poll_event_context, add a pointer back to the tevent_context
This makes it consistent with the other backends.
Signed-off-by: Martin Schwenke <martin at meltin.net>
Signed-off-by: Stefan Metzmacher <metze at samba.org>
commit 6559106b8b853920f325f2dba532f4008e931fa3
Author: Stefan Metzmacher <metze at samba.org>
Date: Mon May 14 11:48:00 2012 +0200
lib/tevent/testsuite: no longer use 'compat' symbols
metze
commit 1a6a011c772f7d302d114d7c8a151fa7820ec85f
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Wed May 30 11:50:13 2012 +1000
Run the shutdown eventscript before we tear down the transport
This allows eventscripts to still be able to call and use ctdb during the shutdown phase.
commit ac89da4eea98fa686408c5671a6c44c0fd1d7a58
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri May 25 15:57:14 2012 +1000
tests: Increment RSN always in ctdb_update_record_persistent test
If the record does not exist in persistent DB, RSN for that record is
considered 0. To write a record, RSN for that record should be set to 1,
otherwise the RSN check would fail.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 0be452958db95c8253c362a1c08a1966e53a1f99
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri May 25 11:40:38 2012 +1000
tests: Fix ctdb_fetch test (parse extra lines of output)
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit bc55e09fdac9f743d6428bfe0be77840ad0fd1ba
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu May 24 16:46:07 2012 +1000
tests: Fix flakey behavior of ctdb_fetch test
There were two issues with this test:
1. Since the messages are sent from one node to the next, if a node
does not register for messages before CTDB on that nodes receives
the message, it will never be seen by ctdb_fetch and it would
block on receive and would not send any messages to next node.
The crude solution is to sleep just before the messages are sent,
so that ctdb_fetch on all nodes have registered for the messages.
2. If ctdb_fetch stops sending messages after timelimit expiry, the
next node will keep waiting to receive messages in event_loop_once().
The default timeout is 30 seconds for event_loop_once(). Adding a
timed event will always set the timeout value to the time remaining
for the timed event to expire.
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 6e5cbe8fff71985e5a2fc16b7e9f2b868011ff5d
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu May 17 16:08:37 2012 +1000
server: Replace BOOL datatype with bool, True/False with true/false
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit fd3b73d7e634f16cbb99d7d5a548e12f00d1aadb
Author: Martin Schwenke <martin at meltin.net>
Date: Fri May 25 11:44:56 2012 +1000
tests/eventscripts: Tweak expected output for lockd:b restart
Commit 13acd58c41fba1a33894fbd654fed69ea0eac322 mades this test fail,
since lockd:b and lockd:bs were incorrectly producing the same output.
commit 14012781c3751a514055df29ea70adfb12ecb2d9
Author: Martin Schwenke <martin at meltin.net>
Date: Wed May 23 15:36:01 2012 +1000
tests: Complex tests must not be run from a cluster node
Tickle tests fail if run from a node involved in the test.
The condition is actually weaker than this: the test can't be run from
a CTDB node that is hosting public addresses that may be used by the
test.
Rework ctdb_test_check_real_cluster() to support checking this.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 7640352c6697f9d4e0d13afbc8523afc64e7d462
Author: Martin Schwenke <martin at meltin.net>
Date: Wed May 23 14:24:40 2012 +1000
Eventscripts: Fix deprecated iptables ! usage
This currently causes warning in the logs.
This change is not SLES10-compatible but we already have some other
non-SLES10-compatible changes.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit c5e3e4bccbde349739b90d8761e1aa19637887a8
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 22 11:24:05 2012 +1000
tests: test_wrap needs to set TEST_BIN_DIR when installed
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit d0b539c4d2d4dc8c9eb95801bff09c3bcbeebca5
Author: Amitay Isaacs <amitay at gmail.com>
Date: Fri May 18 12:59:41 2012 +1000
packaging: make ctdb-tests package depend on nc
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 61df417821762d87ed01a7b5e64c35079940344d
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu May 10 16:59:39 2012 +1000
tests: Use per node log files when running tests with local daemons
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
commit 03fa2a517247eb2adfba67248e2466f17ea14418
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Fri May 25 12:31:11 2012 +1000
RECOVERY: Increase the time we allow before timing out recovery related tasks.
If the system is temporarily taking unusually long to perform these tasks it is better to wait a lot longer and allow the tasks to complete than timing out repeatedly and then becomming banned.
commit 1f262deaad0818f159f9c68330f7fec121679023
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Fri May 25 12:27:59 2012 +1000
RECOVER: When we pull databases during recovery, we used to reallocate the databuffer for each entry added. This would normally not be an issue, but for cases where memory is fragmented, this could start to cost significant cpu if we need to reallocate and move to a different region.
Change this to instead preallocate , by default, 10MByte chunks to the data buffer.
This significantly reduces the number of potential reallocate and move operations that may be required.
Create a tunable to override/change how much preallocation should be used.
commit 6cf6a9b071bd8dd730717ca033337ff73bf247bb
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Mon May 21 14:01:04 2012 +1000
DOCS: Document the new tunables to produce warnings if databases grow unexpectedly big.
commit 9ed58fef4991725f75509433496f4d5ffae0ae87
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Mon May 21 13:11:38 2012 +1000
DEBUG: Add checks for and print debug messages when 1) a database contains very many records, 2) when a database is very big, 3) when a single record is very big.
Add tunables to control when to log these instances and allow it to be completely turned off by setting the threshold to 0
commit 5aba53e6adcfcd7edbdac9e30aa5fcba176aca00
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Mon May 21 09:17:05 2012 +1000
TEVENT: Add back tracking of long runnig events to the local copy of tevent library
commit f59b40b3f8ea3da8ffb8601bc025e83c237072d5
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Thu May 17 11:16:57 2012 +1000
GANESHA: make the ganesha script executable by default
commit f23b5a160184db8c92f8c69307dc4a64adae839d
Merge: 6e68797af67bee36f2bad045f94806e7e98f27e9 637cab6304dae66b85668506028c76ea1ee88980
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Thu May 17 11:48:07 2012 +1000
Merge remote branch 'martins/ganesha'
commit 6e68797af67bee36f2bad045f94806e7e98f27e9
Author: Ronnie Sahlberg <ronniesahlberg at gmail.com>
Date: Thu May 17 10:17:51 2012 +1000
Debug: When scripts hang, we may need to collect additional data in order to debug why the script hung.
Break this debug and datacollection out into an external script to make it easier to modify what data we need to collect.
For now we only collect a pstree so we can see what part of the script we hung in.
S1037271
commit 637cab6304dae66b85668506028c76ea1ee88980
Author: Martin Schwenke <martin at meltin.net>
Date: Wed May 16 17:24:21 2012 +1000
Eventscripts: Modernise 60.ganesha to match 60.nfs
Originally from Srikrishan Malik <srikrishan.malik at in.ibm.com> with
some style changes by me.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 13acd58c41fba1a33894fbd654fed69ea0eac322
Author: Martin Schwenke <martin at meltin.net>
Date: Wed May 16 13:29:58 2012 +1000
Eventscripts: restart lockd in the background when going unhealthy
Sometimes the restart can hang when there are I/O problems. Then the
eventscript times out and gets killed so the node never marked as
unhealthy.
Restarting in the background avoids this.
Signed-off-by: Martin Schwenke <martin at meltin.net>
commit 92f74fd589467b46c758e116e97417edfe8773d7
Author: Martin Schwenke <martin at meltin.net>
Date: Tue May 8 14:53:58 2012 +1000
Eventscript functions: add optional version to nfs_check_rpc_service()
This can be optional because the 1st item of each action-triple is a
test comparison that starts with '-'.
Signed-off-by: Martin Schwenke <martin at meltin.net>
-----------------------------------------------------------------------
--
CTDB repository
More information about the samba-cvs
mailing list