[SCM] Samba Shared Repository - branch master updated

Thu Apr 28 11:19:05 UTC 2016

The branch, master has been updated
       via  f667ff6 ctdb-doc: Document cluster mutex helper API
       via  721f645 ctdb-recovery: Move recovery lock latency updating to handler
       via  bcb838b ctdb-recovery: Move recovery lock functions to recovery daemon code
       via  df99d9e ctdb-cluster-mutex: Factor out cluster mutex code
       via  ecc6751 ctdb-recovery: Factor out setting of cluster mutex handler
       via  94fb2cf ctdb_recovery: ctdb_cluster_mutex() now takes an argstring argument
       via  4668486 ctdb-recovery: Recovery lock setting can now include helper command
       via  918b0d9 ctdb-recovery: Parse recovery lock setting
       via  64d5572 ctdb-recovery: Reimplement ctdb_recovery_lock() using ctdb_cluster_mutex()
       via  0b0b954 ctdb-recovery: Kill cluster mutex helper with a signal that can be caught
       via  e679a17 ctdb-recovery: Switch ctdb_cluster_mutex() to use helper
       via  5cf3b7a ctdb: Add new helper ctdb_mutex_fcntl_helper
       via  c14e0ff ctdb-tools: Simplify "ctdb getreclock" output
       via  978404e ctdb-recovery: Add optional timeout argument to ctdb_cluster_mutex()
       via  43e9f58 ctdb-recovery: Factor out reclock testing into ctdb_cluster_mutex()
       via  ab75f2a ctdb-recovery: Use a configurable handler when testing cluster mutex
       via  419f57f ctdb-recovery: Factor out new function set_recmode_handler()
       via  14a2330 ctdb-recovery: Use single char ASCII numbers for status from child
       via  4842b6b ctdb-recovery: Rename recovery lock functions and struct
       via  1b607f2 ctdb-build: ctdb-system depends on samba-util for debug
      from  10b0a8b smbd: Avoid large reads beyond EOF

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit f667ff6485eb02eb914f3e111ceb44bc8b3e2d6e
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Apr 26 12:31:43 2016 +1000

    ctdb-doc: Document cluster mutex helper API
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Thu Apr 28 13:18:07 CEST 2016 on sn-devel-144

commit 721f64511c7e6da17c2ec886b1ff2db71bcefbb7
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Feb 22 13:29:22 2016 +1100

    ctdb-recovery: Move recovery lock latency updating to handler
    
    The cluster mutex code already passes the latency and expects the
    handler to update the statistics.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit bcb838ba1e5414bb6162fdb0b30f3adc8ccef932
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Feb 17 20:20:03 2016 +1100

    ctdb-recovery: Move recovery lock functions to recovery daemon code
    
    ctdb_recovery_have_lock(), ctdb_recovery_lock(),
    ctdb_recovery_unlock() are only used by recovery daemon, so move them
    there.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit df99d9e2739eb8e5448bc9cfdf3c469d396dd3e3
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Feb 17 14:32:03 2016 +1100

    ctdb-cluster-mutex: Factor out cluster mutex code
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit ecc6751c6bf203404fbb6861d52811f808556e45
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Feb 17 19:40:54 2016 +1100

    ctdb-recovery: Factor out setting of cluster mutex handler
    
    This means that the cluster mutex handle can now be treated as opaque.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 94fb2cf0ec20e3beaaf393e9c06bfd716775b922
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Feb 16 16:46:18 2016 +1100

    ctdb_recovery: ctdb_cluster_mutex() now takes an argstring argument
    
    All of the ctdb_cluster_mutex_* infrastucture can now handle an
    arbitrary mutex.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 46684867b187469f7827a3e3cd213443d44034bf
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Feb 16 16:39:40 2016 +1100

    ctdb-recovery: Recovery lock setting can now include helper command
    
    The underlying change is to allow the cluster mutex argstring to
    optionally contain a helper command.  When the argument string starts
    with '!' then the first word is the helper command to run.  This is
    now the standard way of changing the helper from the default.
    
    CTDB_CLUSTER_MUTEX_HELPER show now only be used to change the location
    of the default helper when testing.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 918b0d9a9c2115df6b5b9bc5752046ecd7bd9e8a
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Feb 16 14:31:45 2016 +1100

    ctdb-recovery: Parse recovery lock setting
    
    This is currently just treated as the name of a lock file.  However,
    it is really some arbitrary arguments to lock helper.
    
    Therefore, it should be parsed and passed as separate arguments to the
    lock helper.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 64d557200ed63e1ff21cd0078e86957b689eff7e
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jan 19 20:33:58 2016 +1100

    ctdb-recovery: Reimplement ctdb_recovery_lock() using ctdb_cluster_mutex()
    
    Replace the file descriptor for the recovery lock in the CTDB context
    with the cluster mutex handle, where non-NULL means locked.
    Attempting to take the recovery lock is now asynchronous and no longer
    blocks the recovery daemon.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 0b0b954ff23149655640571801e2a3f572ebeadc
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Feb 2 14:09:50 2016 +1100

    ctdb-recovery: Kill cluster mutex helper with a signal that can be caught
    
    Unlike fcntl(2), some other helper might need to explicitly take
    action to release a mutex.  This can be done by catching SIGTERM.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit e679a1731cef260d643e79cd2d6a2aa64ddd1e08
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Jan 14 17:09:54 2016 +1100

    ctdb-recovery: Switch ctdb_cluster_mutex() to use helper
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 5cf3b7a1e3e5c60bb9bae123f422887608b17d55
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Dec 8 16:23:50 2015 +1100

    ctdb: Add new helper ctdb_mutex_fcntl_helper
    
    This implements the type of fcntl locking that the recovery lock uses.
    The intent is to use it for multiple locks and allow the choice of
    helper to be configured.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit c14e0ff8e4fdfaf18e0a906329e4e886bc200ab1
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Apr 5 12:04:36 2016 +1000

    ctdb-tools: Simplify "ctdb getreclock" output
    
    If the reclock is set then print it, otherwise print nothing.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 978404ecde877eed0e408d482a3b9309ee12b58e
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jan 12 14:23:19 2016 +1100

    ctdb-recovery: Add optional timeout argument to ctdb_cluster_mutex()
    
    Timeout in seconds, 0 means no timeout.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 43e9f58d6a3824847469ebd6ad9653c3ca0642e9
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jan 12 14:18:27 2016 +1100

    ctdb-recovery: Factor out reclock testing into ctdb_cluster_mutex()
    
    This is currently only used to check whether the recovery lock can be
    taken.  However, name it more generally in anticipation of using it
    for general cluster mutex taking and testing.
    
    No functional changes.  A couple of debug message simplifications and
    code rearrangements.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit ab75f2a5873580e078294a2a326b767587dc34bc
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Jan 12 13:35:47 2016 +1100

    ctdb-recovery: Use a configurable handler when testing cluster mutex
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 419f57f3786dc34de769e1c30f4979f9f3b5906e
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jan 11 16:35:35 2016 +1100

    ctdb-recovery: Factor out new function set_recmode_handler()
    
    This is used to reply to the recmode control for all the different
    cases.  The callers can later be generalised to use a pointer, which
    can then be used for recovery lock handling in different contexts.
    
    Note that the handle is now freed in set_recmode_handler() rather than
    the callbacks.
    
    There is one difference in behaviour.  Deferred attach calls are now
    processed in the timeout case, where they weren't before.  That's a
    bug fix!
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 14a2330692a4d2e0058e72f9ea8b1c61ed920344
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Apr 1 14:35:15 2016 +1100

    ctdb-recovery: Use single char ASCII numbers for status from child
    
      '0' = Child took the mutex
      '1' = Unable to take mutex - contention
      '2' = Unable to take mutex - timeout
      '3' = Unable to take mutex - error
    
    This is a straightforward API.  When the child is generalised to an
    external helper then this makes it easier for a helper to be, for
    example, a simple script.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 4842b6bb91f3f3b13f2433f222678f84832a29d2
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jan 11 14:56:46 2016 +1100

    ctdb-recovery: Rename recovery lock functions and struct
    
    Use the more general name "cluster mutex", since we are likely to end
    up with more than one cluster-wide lock.  There will probably be a
    dedicated recovery lock, held only during recovery, and also a second
    lock that is held by the master node.  Currently one lock is used for
    both purposes.
    
    At the moment the struct and functions are involved with setting the
    recovery mode.  However, they'll be abstracted out to more generally
    deal with the cluster mutexes, so "recmode" -> "cluster_mutex".  Drop
    "set" from names, since this is used to test the lock.  Also drop
    "ctdb" prefix from functions, since they are local to this file.  The
    struct will eventually be a long-lived handle that will release the
    mutex when freed, so name it accordingly.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 1b607f20322f74c5f8829c5bf21e4f210e467e93
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Apr 22 16:51:41 2016 +1000

    ctdb-build: ctdb-system depends on samba-util for debug
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 ctdb/config/events.d/01.reclock                 |   6 +
 ctdb/doc/cluster_mutex_helper.txt               |  79 +++++++
 ctdb/doc/ctdb.1.xml                             |   4 +-
 ctdb/doc/ctdb.7.xml                             |  20 +-
 ctdb/doc/ctdbd.1.xml                            |   7 +-
 ctdb/doc/ctdbd.conf.5.xml                       |   8 +-
 ctdb/include/ctdb_private.h                     |   8 +-
 ctdb/packaging/RPM/ctdb.spec.in                 |   2 +
 ctdb/server/ctdb_cluster_mutex.c                | 266 ++++++++++++++++++++++
 ctdb/server/{ipalloc.c => ctdb_cluster_mutex.h} |  49 ++---
 ctdb/server/ctdb_mutex_fcntl_helper.c           |  90 ++++++++
 ctdb/server/ctdb_recover.c                      | 281 ++++++------------------
 ctdb/server/ctdb_recoverd.c                     |  80 ++++++-
 ctdb/server/ctdbd.c                             |   2 +-
 ctdb/tests/simple/35_set_reclock.sh             |  13 +-
 ctdb/tests/simple/scripts/local_daemons.bash    |   8 +-
 ctdb/tests/src/ctdbd_test.c                     |   1 +
 ctdb/tools/ctdb.c                               |  12 +-
 ctdb/wscript                                    |   9 +-
 19 files changed, 656 insertions(+), 289 deletions(-)
 create mode 100644 ctdb/doc/cluster_mutex_helper.txt
 create mode 100644 ctdb/server/ctdb_cluster_mutex.c
 copy ctdb/server/{ipalloc.c => ctdb_cluster_mutex.h} (52%)
 create mode 100644 ctdb/server/ctdb_mutex_fcntl_helper.c


Changeset truncated at 500 lines:

diff --git a/ctdb/config/events.d/01.reclock b/ctdb/config/events.d/01.reclock
index d3dd612..e2d4d12 100755
--- a/ctdb/config/events.d/01.reclock
+++ b/ctdb/config/events.d/01.reclock
@@ -7,6 +7,12 @@
 . $CTDB_BASE/functions
 loadconfig
 
+# If CTDB_RECOVERY_LOCK specifies a helper then exit because this
+# script can't do anything useful.
+case "$CTDB_RECOVERY_LOCK" in
+!*) exit 0 ;;
+esac
+
 case "$1" in
     init)
 	ctdb_counter_init
diff --git a/ctdb/doc/cluster_mutex_helper.txt b/ctdb/doc/cluster_mutex_helper.txt
new file mode 100644
index 0000000..0fc3a50
--- /dev/null
+++ b/ctdb/doc/cluster_mutex_helper.txt
@@ -0,0 +1,79 @@
+Writing CTDB cluster mutex helpers
+==================================
+
+CTDB uses cluster-wide mutexes to protect against a "split brain",
+which could occur if the cluster becomes partitioned due to network
+failure or similar.
+
+CTDB uses a cluster-wide mutex for its "recovery lock", which is used
+to ensure that only one database recovery can happen at a time.  For
+an overview of recovery lock configuration see the RECOVERY LOCK
+section in ctdb(7).  CTDB tries to ensure correct operation of the
+recovery lock by attempting to take the recovery lock when CTDB knows
+that it should already be held.
+
+By default, CTDB uses a supplied mutex helper that uses a fcntl(2)
+lock on a specified file in the cluster filesystem.
+
+However, a user supplied mutex helper can be used as an alternative.
+The rest of this document describes the API for mutex helpers.
+
+A mutex helper is an external executable
+----------------------------------------
+
+A mutex helper is an external executable that can be run by CTDB.
+There are no CTDB-specific compilation dependencies.  This means that
+a helper could easily be scripted around existing commands.  Mutex
+helpers are run relatively rarely and are not time critical.
+Therefore, reliability is preferred over high performance.
+
+Taking a mutex with a helper
+----------------------------
+
+1. Helper is executed with helper-specific arguments
+
+2. Helper attempts to take mutex
+
+3. On success, the helper writes ASCII 0 to standard output
+
+4. Helper stays running, holding mutex, awaiting termination by CTDB
+
+5. When a helper receives SIGTERM it must release any mutex it is
+   holding and then exit.
+
+Status codes
+------------
+
+CTDB ignores the exit code of a helper.  Instead, CTDB reacts to a
+single ASCII character that is sent to it via a helper's standard
+output.
+
+Valid status codes are:
+
+0 - The helper took the mutex and is holding it, awaiting termination.
+
+1 - The helper was unable to take the mutex due to contention.
+
+2 - The helper took too long to take the mutex.
+
+    Helpers do not need to implement this status code.  CTDB
+    already implements any required timeout handling.
+
+3 - An unexpected error occurred.
+
+If a 0 status code is sent then it the helper should periodically
+check if the (original) parent processes still exists while awaiting
+termination.  If the parent process disappears then the helper should
+release the mutex and exit.  This avoids stale mutexes.
+
+If a non-0 status code is sent then the helper can exit immediately.
+However, if the helper does not exit then it must terminate if it
+receives SIGTERM.
+
+Logging
+-------
+
+Anything written to standard error by a helper is incorporated into
+CTDB's logs.  A helper should generally only output to stderr for
+unexpected errors and avoid output to stderr on success or on mutex
+contention.
diff --git a/ctdb/doc/ctdb.1.xml b/ctdb/doc/ctdb.1.xml
index 12ead00..cbdc5c7 100644
--- a/ctdb/doc/ctdb.1.xml
+++ b/ctdb/doc/ctdb.1.xml
@@ -955,14 +955,14 @@ DB Statistics: locking.tdb
     <refsect2>
       <title>getreclock</title>
       <para>
-	Show the name of the recovery lock file, if any.
+	Show details of the recovery lock, if any.
       </para>
 
       <para>
 	Example output:
       </para>
       <screen>
-	Reclock file:/clusterfs/.ctdb/recovery.lock
+	/clusterfs/.ctdb/recovery.lock
       </screen>
 
     </refsect2>
diff --git a/ctdb/doc/ctdb.7.xml b/ctdb/doc/ctdb.7.xml
index 51222ad..1421143 100644
--- a/ctdb/doc/ctdb.7.xml
+++ b/ctdb/doc/ctdb.7.xml
@@ -98,15 +98,27 @@
     </para>
 
     <para>
-      The recovery lock is implemented using a file residing in shared
-      storage (usually) on a cluster filesystem.  To support a
-      recovery lock the cluster filesystem must support lock
-      coherence.  See
+      By default, the recovery lock is implemented using a file
+      (specified by <parameter>CTDB_RECOVERY_LOCK</parameter>)
+      residing in shared storage (usually) on a cluster filesystem.
+      To support a recovery lock the cluster filesystem must support
+      lock coherence.  See
       <citerefentry><refentrytitle>ping_pong</refentrytitle>
       <manvolnum>1</manvolnum></citerefentry> for more details.
     </para>
 
     <para>
+      The recovery lock can also be implemented using an arbitrary
+      cluster mutex call-out by using an exclamation point ('!') as
+      the first character of
+      <parameter>CTDB_RECOVERY_LOCK</parameter>.  For example, a value
+      of <command>!/usr/local/bin/myhelper recovery</command> would
+      run the given helper with the specified arguments.  See the
+      source code relating to cluster mutexes for clues about writing
+      call-outs.
+    </para>
+
+    <para>
       If a cluster becomes partitioned (for example, due to a
       communication failure) and a different recovery master is
       elected by the nodes in each partition, then only one of these
diff --git a/ctdb/doc/ctdbd.1.xml b/ctdb/doc/ctdbd.1.xml
index 0f75f77..7b8cc66 100644
--- a/ctdb/doc/ctdbd.1.xml
+++ b/ctdb/doc/ctdbd.1.xml
@@ -365,12 +365,11 @@
       </varlistentry>
 
       <varlistentry>
-	<term>--reclock=<parameter>FILE</parameter></term>
+	<term>--reclock=<parameter>LOCK</parameter></term>
 	<listitem>
 	  <para>
-	    FILE is the name of the recovery lock file, stored in
-	    <emphasis>shared storage</emphasis>, that CTDB uses to
-	    prevent split brains.
+	    LOCK specifies the cluster-wide mutex used to detect and
+	    prevent a partitioned cluster (or "split brain").
 	  </para>
 	  <para>
 	    For information about the recovery lock please see the
diff --git a/ctdb/doc/ctdbd.conf.5.xml b/ctdb/doc/ctdbd.conf.5.xml
index 324be05..a364c9f 100644
--- a/ctdb/doc/ctdbd.conf.5.xml
+++ b/ctdb/doc/ctdbd.conf.5.xml
@@ -379,10 +379,14 @@
       </varlistentry>
 
       <varlistentry>
-	<term>CTDB_RECOVERY_LOCK=<parameter>FILENAME</parameter></term>
+	<term>CTDB_RECOVERY_LOCK=<parameter>LOCK</parameter></term>
 	<listitem>
 	  <para>
-	    Defaults to
+	    LOCK specifies the cluster-wide mutex used to detect and
+	    prevent a partitioned cluster (or "split brain").
+	  </para>
+	  <para>
+	    No default, but the default configuration file specifies
 	    <filename>/some/place/on/shared/storage</filename>, which
 	    should be change to a useful value.  Corresponds to
 	    <option>--reclock</option>.
diff --git a/ctdb/include/ctdb_private.h b/ctdb/include/ctdb_private.h
index 41b9f4f..f8889e0 100644
--- a/ctdb/include/ctdb_private.h
+++ b/ctdb/include/ctdb_private.h
@@ -280,6 +280,8 @@ struct ctdb_daemon_data {
 	}
 
 
+struct ctdb_cluster_mutex_handle;
+
 enum ctdb_freeze_mode {CTDB_FREEZE_NONE, CTDB_FREEZE_PENDING, CTDB_FREEZE_FROZEN};
 
 #define NUM_DB_PRIORITIES 3
@@ -309,7 +311,7 @@ struct ctdb_context {
 	uint64_t max_persistent_check_errors;
 	const char *transport;
 	char *recovery_lock_file;
-	int recovery_lock_fd;
+	struct ctdb_cluster_mutex_handle *recovery_lock_handle;
 	uint32_t pnn; /* our own pnn */
 	uint32_t num_nodes;
 	uint32_t num_connected;
@@ -887,10 +889,6 @@ int32_t ctdb_control_set_recmode(struct ctdb_context *ctdb,
 				 TDB_DATA indata, bool *async_reply,
 				 const char **errormsg);
 
-bool ctdb_recovery_have_lock(struct ctdb_context *ctdb);
-bool ctdb_recovery_lock(struct ctdb_context *ctdb);
-void ctdb_recovery_unlock(struct ctdb_context *ctdb);
-
 int32_t ctdb_control_end_recovery(struct ctdb_context *ctdb,
 				 struct ctdb_req_control_old *c,
 				 bool *async_reply);
diff --git a/ctdb/packaging/RPM/ctdb.spec.in b/ctdb/packaging/RPM/ctdb.spec.in
index 19c2af1..9710ca0 100644
--- a/ctdb/packaging/RPM/ctdb.spec.in
+++ b/ctdb/packaging/RPM/ctdb.spec.in
@@ -163,6 +163,7 @@ rm -rf $RPM_BUILD_ROOT
 %doc README COPYING
 %doc README.eventscripts README.notify.d
 %doc doc/recovery-process.txt
+%doc doc/cluster_mutex_helper.txt
 %doc doc/*.html
 %doc doc/examples
 %{_sysconfdir}/sudoers.d/ctdb
@@ -204,6 +205,7 @@ rm -rf $RPM_BUILD_ROOT
 %{_libexecdir}/ctdb/ctdb_lock_helper
 %{_libexecdir}/ctdb/ctdb_event_helper
 %{_libexecdir}/ctdb/ctdb_recovery_helper
+%{_libexecdir}/ctdb/ctdb_mutex_fcntl_helper
 %{_libexecdir}/ctdb/ctdb_natgw
 %{_libexecdir}/ctdb/ctdb_lvs
 %{_libexecdir}/ctdb/ctdb_killtcp
diff --git a/ctdb/server/ctdb_cluster_mutex.c b/ctdb/server/ctdb_cluster_mutex.c
new file mode 100644
index 0000000..12950c4
--- /dev/null
+++ b/ctdb/server/ctdb_cluster_mutex.c
@@ -0,0 +1,266 @@
+/*
+   CTDB cluster mutex handling
+
+   Copyright (C) Andrew Tridgell  2007
+   Copyright (C) Ronnie Sahlberg  2007
+   Copyright (C) Martin Schwenke  2016
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, see <http://www.gnu.org/licenses/>.
+*/
+
+#include <tevent.h>
+
+#include "replace.h"
+#include "system/network.h"
+
+#include "lib/util/debug.h"
+#include "lib/util/time.h"
+#include "lib/util/strv.h"
+#include "lib/util/strv_util.h"
+
+#include "ctdb_private.h"
+#include "common/common.h"
+#include "common/logging.h"
+#include "common/system.h"
+
+#include "ctdb_cluster_mutex.h"
+
+struct ctdb_cluster_mutex_handle {
+	struct ctdb_context *ctdb;
+	cluster_mutex_handler_t handler;
+	void *private_data;
+	int fd[2];
+	struct tevent_timer *te;
+	struct tevent_fd *fde;
+	pid_t child;
+	struct timeval start_time;
+};
+
+void ctdb_cluster_mutex_set_handler(struct ctdb_cluster_mutex_handle *h,
+				    cluster_mutex_handler_t handler,
+				    void *private_data)
+{
+	h->handler = handler;
+	h->private_data = private_data;
+}
+
+static void cluster_mutex_timeout(struct tevent_context *ev,
+				  struct tevent_timer *te,
+				  struct timeval t, void *private_data)
+{
+	struct ctdb_cluster_mutex_handle *h =
+		talloc_get_type(private_data, struct ctdb_cluster_mutex_handle);
+	double latency = timeval_elapsed(&h->start_time);
+
+	if (h->handler != NULL) {
+		h->handler(h->ctdb, '2', latency, h, h->private_data);
+	}
+}
+
+
+/* When the handle is freed it causes any child holding the mutex to
+ * be killed, thus freeing the mutex */
+static int cluster_mutex_destructor(struct ctdb_cluster_mutex_handle *h)
+{
+	if (h->fd[0] != -1) {
+		h->fd[0] = -1;
+	}
+	ctdb_kill(h->ctdb, h->child, SIGTERM);
+	return 0;
+}
+
+/* this is called when the client process has completed ctdb_recovery_lock()
+   and has written data back to us through the pipe.
+*/
+static void cluster_mutex_handler(struct tevent_context *ev,
+				  struct tevent_fd *fde,
+				  uint16_t flags, void *private_data)
+{
+	struct ctdb_cluster_mutex_handle *h=
+		talloc_get_type(private_data, struct ctdb_cluster_mutex_handle);
+	double latency = timeval_elapsed(&h->start_time);
+	char c = '0';
+	int ret;
+
+	/* Got response from child process so abort timeout */
+	TALLOC_FREE(h->te);
+
+	ret = sys_read(h->fd[0], &c, 1);
+
+	/* If the child wrote status then just pass it to the handler.
+	 * If no status was written then this is an unexpected error
+	 * so pass generic error code to handler. */
+	if (h->handler != NULL) {
+		h->handler(h->ctdb, ret == 1 ? c : '3', latency,
+			   h, h->private_data);
+	}
+}
+
+static char cluster_mutex_helper[PATH_MAX+1] = "";
+
+static bool cluster_mutex_helper_args(TALLOC_CTX *mem_ctx,
+				      const char *argstring, char ***argv)
+{
+	int nargs, i, ret, n;
+	bool is_command = false;
+	char **args = NULL;
+	char *strv = NULL;
+	char *t = NULL;
+
+	if (argstring != NULL && argstring[0] == '!') {
+		/* This is actually a full command */
+		is_command = true;
+		t = discard_const(&argstring[1]);
+	} else {
+		is_command = false;
+		t = discard_const(argstring);
+	}
+
+	ret = strv_split(mem_ctx, &strv, t, " \t");
+	if (ret != 0) {
+		DEBUG(DEBUG_ERR,
+		      ("Unable to parse mutex helper string \"%s\" (%s)\n",
+		       argstring, strerror(ret)));
+		return false;
+	}
+	n = strv_count(strv);
+
+	args = talloc_array(mem_ctx, char *, n + (is_command ? 1 : 2));
+
+	if (args == NULL) {
+		DEBUG(DEBUG_ERR,(__location__ " out of memory\n"));
+		return false;
+	}
+
+	nargs = 0;
+
+	if (! is_command) {
+		if (!ctdb_set_helper("cluster mutex helper",
+				     cluster_mutex_helper,
+				     sizeof(cluster_mutex_helper),
+				     "CTDB_CLUSTER_MUTEX_HELPER",
+				     CTDB_HELPER_BINDIR,
+				     "ctdb_mutex_fcntl_helper")) {
+			DEBUG(DEBUG_ERR,("ctdb exiting with error: %s\n",
+					 __location__
+					 " Unable to set cluster mutex helper\n"));
+			exit(1);
+		}
+
+		args[nargs++] = cluster_mutex_helper;
+	}
+
+	t = NULL;
+	for (i = 0; i < n; i++) {
+		/* Don't copy, just keep cmd_args around */
+		t = strv_next(strv, t);
+		args[nargs++] = t;
+	}
+
+	/* Make sure last argument is NULL */
+	args[nargs] = NULL;
+
+	*argv = args;
+	return true;
+}
+
+struct ctdb_cluster_mutex_handle *
+ctdb_cluster_mutex(struct ctdb_context *ctdb,
+		   const char *argstring,
+		   int timeout)
+{
+	struct ctdb_cluster_mutex_handle *h;
+	char **args;
+	int ret;
+
+	h = talloc(ctdb, struct ctdb_cluster_mutex_handle);
+	if (h == NULL) {
+		DEBUG(DEBUG_ERR, (__location__ " out of memory\n"));
+		return NULL;
+	}
+
+	h->start_time = timeval_current();
+	h->fd[0] = -1;
+	h->fd[1] = -1;
+
+	ret = pipe(h->fd);
+	if (ret != 0) {
+		talloc_free(h);
+		DEBUG(DEBUG_ERR, (__location__ " Failed to open pipe\n"));
+		return NULL;
+	}
+	set_close_on_exec(h->fd[0]);
+
+	/* Create arguments for lock helper */
+	if (!cluster_mutex_helper_args(h, argstring, &args)) {
+		close(h->fd[0]);
+		close(h->fd[1]);
+		talloc_free(h);
+		return NULL;
+	}
+
+	h->child = ctdb_fork(ctdb);
+	if (h->child == (pid_t)-1) {
+		close(h->fd[0]);
+		close(h->fd[1]);
+		talloc_free(h);
+		return NULL;
+	}
+
+	if (h->child == 0) {
+		/* Make stdout point to the pipe */
+		close(STDOUT_FILENO);
+		dup2(h->fd[1], STDOUT_FILENO);
+		close(h->fd[1]);
+
+		execv(args[0], args);
+
+		/* Only happens on error */
+		DEBUG(DEBUG_ERR, (__location__ "execv() failed\n"));
+		_exit(1);
+	}
+
+	/* Parent */
+
+	DEBUG(DEBUG_DEBUG, (__location__ " Created PIPE FD:%d\n", h->fd[0]));
+	set_close_on_exec(h->fd[0]);
+
+	close(h->fd[1]);
+	h->fd[1] = -1;
+


-- 
Samba Shared Repository