[SCM] CTDB repository - branch master updated - ctdb-2.1-60-g06ac62f
Amitay Isaacs
amitay at samba.org
Wed Apr 17 05:17:02 MDT 2013
The branch, master has been updated
via 06ac62f890299021220214327f1b611c3cf00145 (commit)
via b1577a11d548479ff1a05702d106af9465921ad4 (commit)
via 2438f3a4944f7adbcae4cc1b9d5452714244afe7 (commit)
via cad3107b12e8392f786f9a758ee38cf3a3d58538 (commit)
via feb1d40b21a160737aead22e398f3c34ff3be8de (commit)
from 4c0cbfbe8b19f2e6fe17093b52c734bec63dd8b7 (commit)
http://gitweb.samba.org/?p=ctdb.git;a=shortlog;h=master
- Log -----------------------------------------------------------------
commit 06ac62f890299021220214327f1b611c3cf00145
Author: Michael Adam <obnox at samba.org>
Date: Wed Apr 17 13:08:49 2013 +0200
tests: add a comment to recovery db corruption test
The comment explains that we use "ctdb stop" and "ctdb continue"
but we should use "ctdb setcrecmasterrole off".
Signed-off-by: Michael Adam <obnox at samba.org>
commit b1577a11d548479ff1a05702d106af9465921ad4
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Apr 11 16:59:36 2013 +1000
tests: Add a test for subsequent recoveries corrupting databases
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Michael Adam <obnox at samba.org>
commit 2438f3a4944f7adbcae4cc1b9d5452714244afe7
Author: Amitay Isaacs <amitay at gmail.com>
Date: Thu Apr 11 16:58:34 2013 +1000
tests: Support waiting for "recovered" state in tests
Signed-off-by: Amitay Isaacs <amitay at gmail.com>
Reviewed-by: Michael Adam <obnox at samba.org>
commit cad3107b12e8392f786f9a758ee38cf3a3d58538
Author: Michael Adam <obnox at samba.org>
Date: Wed Apr 3 12:02:59 2013 +0200
ctdb_call: don't bump the rsn in ctdb_become_dmaster() any more
This is now done in ctdb_ltdb_store_server(), so this
extra bump can be spared.
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
commit feb1d40b21a160737aead22e398f3c34ff3be8de
Author: Michael Adam <obnox at samba.org>
Date: Wed Apr 3 11:40:25 2013 +0200
Fix a severe recovery bug that can lead to data corruption for SMB clients.
Problem:
Recovery can under certain circumstances lead to old record copies
resurrecting: Recovery selects the newest record copy purely by RSN. At
the end of the recovery, the recovery master is the dmaster for all
records in all (non-persistent) databases. And the other nodes locally
hold the complete copy of the databases. The bug is that the recovery
process does not increment the RSN on the recovery master at the end of
the recovery. Now clients acting directly on the Recovery master will
directly change a record's content on the recmaster without migration
and hence without RSN bump. So a subsequent recovery can not tell that
the recmaster's copy is newer than the copies on the other nodes, since
their RSN is the same. Hence, if the recmaster is not node 0 (or more
precisely not the active node with the lowest node number), the recovery
will choose copies from nodes with lower number and stick to these.
Here is how to reproduce:
- assume we have a cluster with at least 2 nodes
- ensure that the recmaster is not node 0
(maybe ensure with "onnode 0 ctdb setrecmasterrole off")
say recmaster is node 1
- choose a new database name, say "test1.tdb"
(make sure it is not yet attached as persistent)
- choose a key name, say "key1"
- all clustere nodes should ok and no recovery running
- now do the following on node 1:
1. dbwrap_tool test1.tdb store key1 uint32 1
2. dbwrap_tool test1.tdb fetch key1 uint32
==> 1
3. ctdb recover
4. dbwrap_tool test1.tdb store key1 uint32 2
5. dbwrap_tool test1.tdb fetch key1 uint32
==> 2
4. ctdb recover
7. dbwrap_tool test1.tdb fetch key1 uint32
==> 1
==> BUG
This is a very severe bug, since when applied to Samba's locking.tdb
database, it means that for SMB clients on clustered Samba there is
the potential for locking out oneself from previously opened files
or even worse, data corruption:
Case 1: locking out
- client on recmaster opens file
- recovery propagates open file handle (entry in locking.tdb) to
other nodes
- client closes file
- client opens the same file
- recovery resurrects old copy of open file record in locking.tdb
from lower node
- client closes file but fails to delete entry in locking.tdb
- client tries to open same file again but fails, since
the old record locks it out (since the client is still connected)
Case 2: data corruption
- clien1 on recmaster opens file
- recovery propagates open file info to other nodes
- client1 closes the file and disconnects
- client2 opens the same file
- recovery resurrects old copy of locking.tdb record,
where client2 has no entry, but client1 has.
- but client2 believes it still has a handle
- client3 opens the file and succees without
conflicting with client2
(the detached entry for client1 is discarded because
the server does not exist any more).
=> both client2 and client3 believe they have exclusive
access to the file and writing creates data corruption
Fix:
When storing a record on the dmaster, bump its RSN.
The ctdb_ltdb_store_server() is the central function for storing
a record to a local tdb from the ctdbd server context.
So this is also the place where the RSN of the record to be stored
should be incremented, when storing on the dmaster.
For the case of the record migration, this is currently done in
ctdb_become_dmaster() in ctdb_call.c, but there are other places
such as in recovery, where we should bump the RSN, but currently
don't do it.
So moving the RSN incrementation into ctdb_ltdb_store_server fixes
the recovery-record-resurrection bug.
Signed-off-by: Michael Adam <obnox at samba.org>
Reviewed-By: Amitay Isaacs <amitay at gmail.com>
-----------------------------------------------------------------------
Summary of changes:
server/ctdb_call.c | 2 +-
server/ctdb_ltdb_server.c | 9 ++-
tests/scripts/integration.bash | 5 +-
tests/simple/77_ctdb_db_recovery.sh | 133 +++++++++++++++++++++++++++++++++++
4 files changed, 144 insertions(+), 5 deletions(-)
create mode 100755 tests/simple/77_ctdb_db_recovery.sh
Changeset truncated at 500 lines:
diff --git a/server/ctdb_call.c b/server/ctdb_call.c
index 67219c4..a6c6389 100644
--- a/server/ctdb_call.c
+++ b/server/ctdb_call.c
@@ -343,7 +343,7 @@ static void ctdb_become_dmaster(struct ctdb_db_context *ctdb_db,
DEBUG(DEBUG_DEBUG,("pnn %u dmaster response %08x\n", ctdb->pnn, ctdb_hash(&key)));
ZERO_STRUCT(header);
- header.rsn = rsn + 1;
+ header.rsn = rsn;
header.dmaster = ctdb->pnn;
header.flags = record_flags;
diff --git a/server/ctdb_ltdb_server.c b/server/ctdb_ltdb_server.c
index 0432e49..c5d9be1 100644
--- a/server/ctdb_ltdb_server.c
+++ b/server/ctdb_ltdb_server.c
@@ -125,12 +125,15 @@ static int ctdb_ltdb_store_server(struct ctdb_db_context *ctdb_db,
}
if (keep) {
- if ((data.dsize == 0) &&
- !ctdb_db->persistent &&
+ if (!ctdb_db->persistent &&
(ctdb_db->ctdb->pnn == header->dmaster) &&
!(header->flags & (CTDB_REC_RO_HAVE_DELEGATIONS|CTDB_REC_RO_HAVE_READONLY|CTDB_REC_RO_REVOKING_READONLY|CTDB_REC_RO_REVOKE_COMPLETE)))
{
- schedule_for_deletion = true;
+ header->rsn++;
+
+ if (data.dsize == 0) {
+ schedule_for_deletion = true;
+ }
}
remove_from_delete_queue = !schedule_for_deletion;
}
diff --git a/tests/scripts/integration.bash b/tests/scripts/integration.bash
index 2e5fb37..2ec827c 100644
--- a/tests/scripts/integration.bash
+++ b/tests/scripts/integration.bash
@@ -370,7 +370,7 @@ node_has_status ()
local pnn="$1"
local status="$2"
- local bits fpat mpat
+ local bits fpat mpat rpat
case "$status" in
(unhealthy) bits="?:?:?:1:*" ;;
(healthy) bits="?:?:?:0:*" ;;
@@ -386,6 +386,7 @@ node_has_status ()
(unfrozen) fpat='^[[:space:]]+frozen[[:space:]]+0$' ;;
(monon) mpat='^Monitoring mode:ACTIVE \(0\)$' ;;
(monoff) mpat='^Monitoring mode:DISABLED \(1\)$' ;;
+ (recovered) rpat='^Recovery mode:NORMAL \(0\)$' ;;
*)
echo "node_has_status: unknown status \"$status\""
return 1
@@ -410,6 +411,8 @@ node_has_status ()
$CTDB statistics -n "$pnn" | egrep -q "$fpat"
elif [ -n "$mpat" ] ; then
$CTDB getmonmode -n "$pnn" | egrep -q "$mpat"
+ elif [ -n "$rpat" ] ; then
+ $CTDB status -n "$pnn" | egrep -q "$rpat"
else
echo 'node_has_status: unknown mode, neither $bits nor $fpat is set'
return 1
diff --git a/tests/simple/77_ctdb_db_recovery.sh b/tests/simple/77_ctdb_db_recovery.sh
new file mode 100755
index 0000000..00fa096
--- /dev/null
+++ b/tests/simple/77_ctdb_db_recovery.sh
@@ -0,0 +1,133 @@
+#!/bin/bash
+
+test_info()
+{
+ cat <<EOF
+Recovery can under certain circumstances lead to old record copies
+resurrecting: Recovery selects the newest record copy purely by RSN. At
+the end of the recovery, the recovery master is the dmaster for all
+records in all (non-persistent) databases. And the other nodes locally
+hold the complete copy of the databases. The bug is that the recovery
+process does not increment the RSN on the recovery master at the end of
+the recovery. Now clients acting directly on the Recovery master will
+directly change a record's content on the recmaster without migration
+and hence without RSN bump. So a subsequent recovery can not tell that
+the recmaster's copy is newer than the copies on the other nodes, since
+their RSN is the same. Hence, if the recmaster is not node 0 (or more
+precisely not the active node with the lowest node number), the recovery
+will choose copies from nodes with lower number and stick to these.
+
+Steps:
+
+1. Create a test database
+2. Add a record with value value1 on recovery master
+3. Force a recovery
+4. Update the record with value value2 on recovery master
+5. Force a recovery
+6. Fetch the record
+
+Expected results:
+
+* The record should have value value2 and not value1
+
+EOF
+}
+
+. "${TEST_SCRIPTS_DIR}/integration.bash"
+
+ctdb_test_init "$@"
+
+set -e
+
+cluster_is_healthy
+
+# Reset configuration
+ctdb_restart_when_done
+
+#
+# Main test
+#
+TESTDB="rec_test.tdb"
+
+status=0
+
+# Make sure node 0 is not the recovery master
+echo "find out which node is recmaster"
+try_command_on_node -q any $CTDB_TEST_WRAPPER ctdb recmaster
+recmaster="$out"
+if [ "$recmaster" = "0" ]; then
+ echo "node 0 is recmaster, disable recmasterrole on node 0"
+ #
+ # Note:
+ # It should be sufficient to run "ctdb setrecmasterrole off"
+ # on node 0 and wait for election and recovery to finish.
+ # But there were problems related to this in this automatic
+ # test, so for now use "ctdb stop" and "ctdb continue".
+ #
+ echo "stop node 0"
+ try_command_on_node -q 0 $CTDB_TEST_WRAPPER ctdb stop
+ wait_until_node_has_status 0 stopped
+ echo "continue node 0"
+ try_command_on_node -q 0 $CTDB_TEST_WRAPPER ctdb continue
+ wait_until_node_has_status 0 notstopped
+
+ try_command_on_node -q any $CTDB_TEST_WRAPPER ctdb recmaster
+ recmaster="$out"
+ if [ "$recmaster" = "0" ]; then
+ echo "failed to move recmaster to different node"
+ exit 1
+ fi
+fi
+
+echo "Recmaster:$recmaster"
+
+# Create a temporary non-persistent database to test with
+echo "create test database $TESTDB"
+try_command_on_node -q $recmaster $CTDB_TEST_WRAPPER ctdb attach $TESTDB
+
+# Wipe Test database
+echo "wipe test database"
+try_command_on_node -q $recmaster $CTDB_TEST_WRAPPER ctdb wipedb $TESTDB
+
+# Add a record key=test1 data=value1
+echo "store key(test1) data(value1)"
+try_command_on_node -q $recmaster $CTDB_TEST_WRAPPER ctdb writekey $TESTDB test1 value1
+
+# Fetch a record key=test1
+echo "read key(test1)"
+try_command_on_node -q $recmaster $CTDB_TEST_WRAPPER ctdb readkey $TESTDB test1
+echo "$out"
+
+# Do a recovery
+echo "force recovery"
+try_command_on_node -q $recmaster $CTDB_TEST_WRAPPER ctdb recover
+
+wait_until_node_has_status $recmaster recovered
+
+# Add a record key=test1 data=value2
+echo "store key(test1) data(value2)"
+try_command_on_node -q $recmaster $CTDB_TEST_WRAPPER ctdb writekey $TESTDB test1 value2
+
+# Fetch a record key=test1
+echo "read key(test1)"
+try_command_on_node -q $recmaster $CTDB_TEST_WRAPPER ctdb readkey $TESTDB test1
+echo "$out"
+
+# Do a recovery
+echo "force recovery"
+try_command_on_node -q $recmaster $CTDB_TEST_WRAPPER ctdb recover
+
+wait_until_node_has_status $recmaster recovered
+
+# Verify record key=test1
+echo "read key(test1)"
+try_command_on_node $recmaster $CTDB_TEST_WRAPPER ctdb readkey $TESTDB test1
+echo "$out"
+if [ "$out" = "Data: size:6 ptr:[value2]" ]; then
+ echo "GOOD: Recovery did not corrupt database"
+else
+ echo "BAD: Recovery corrupted database"
+ status=1
+fi
+
+exit $status
--
CTDB repository
More information about the samba-cvs
mailing list