[SCM] Samba Shared Repository - branch master updated

Amitay Isaacs amitay at samba.org
Sat Aug 29 18:09:04 UTC 2015


The branch, master has been updated
       via  4164d7b ctdb-scripts: Add default filesystem usage warnings
       via  0f28ccf ctdb-scripts: Add default system memory usage warnings
       via  2c601f1 ctdb-scripts: Enable system monitoring eventscript by default
       via  b18e4ae ctdb-scripts: Throttle system resource monitoring warnings
       via  e6b5163 ctdb-scripts: Don't shutdown CTDB when memory monitoring fails
       via  b6a0e4b ctdb-scripts: New consistent system memory and swap monitoring
       via  02fa6c3 ctdb-scripts: Factor out new function check_thresholds()
       via  b7b6e25 ctdb-scripts: Memory monitoring uses thresholds expressed as percentages
       via  bd2845d ctdb-scripts: Use MemAvailable if it is in /proc/meminfo
       via  99b8ef5 ctdb-scripts: Only use /proc/meminfo for memory checks, not "free"
       via  ab58c7a ctdb-scripts: Move system memory checking to 05.system
       via  b27ff25 ctdb-tests: Remove unwanted trailing whitespace
       via  23acbd2 ctdb-tests: Add tests for filesystem usage monitoring
       via  fa10506 ctdb-scripts: New configuration variable CTDB_MONITOR_FILESYSTEM_USAGE
       via  8f713c8 ctdb-scripts: Don't fail monitoring if sanity checks fail
       via  6b4a46e ctdb-scripts: Move filesystem monitoring into a function, clean it up
       via  47f7d1b ctdb-scripts: Rename 40.fs_use to 05.system
      from  e139f19 s3: add suport for SMB3_10 and SMB3_11 protocols in smbstatus

https://git.samba.org/?p=samba.git;a=shortlog;h=master


- Log -----------------------------------------------------------------
commit 4164d7bf3153a2fd9081b4d073bfa88fec1507ad
Author: Martin Schwenke <martin at meltin.net>
Date:   Tue Aug 18 15:22:23 2015 +1000

    ctdb-scripts: Add default filesystem usage warnings
    
    Always check filesystem usage for the database directories.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>
    
    Autobuild-User(master): Amitay Isaacs <amitay at samba.org>
    Autobuild-Date(master): Sat Aug 29 20:08:48 CEST 2015 on sn-devel-104

commit 0f28ccf87af4e90867eaab213a640f6d0cdaa12d
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Aug 14 17:08:45 2015 +1000

    ctdb-scripts: Add default system memory usage warnings
    
    CTDB should warn by default if too much system memory or swap is used.
    
    The tests have also been tweaked.  In particular, the filesystem-only
    tests need to initialise the memory information to avoid errors where
    meminfo isn't set.
    
    Document the defaults, warning against disabling them.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 2c601f189521ae65ec5ab867c6d8c88cb5d1ae8c
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Aug 6 15:59:06 2015 +1000

    ctdb-scripts: Enable system monitoring eventscript by default
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit b18e4ae0c9536a549722aeef8bc6c095b12db962
Author: Martin Schwenke <martin at meltin.net>
Date:   Wed Aug 5 20:42:16 2015 +1000

    ctdb-scripts: Throttle system resource monitoring warnings
    
    They are only printed when the percentage usage changes.  This should
    stop the logs from being filled with warnings.
    
    Add a test for the throttling.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit e6b5163bc1c3551a808d3741b4cbac80e15d10d9
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Aug 3 19:55:27 2015 +1000

    ctdb-scripts: Don't shutdown CTDB when memory monitoring fails
    
    Marking the node unhealthy should cause Samba processes to close,
    possible freeing a stack of memory.  If not, then it is somebody
    else's problem.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit b6a0e4b85699241ba90f25f4c605cbb7a6fc2146
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Aug 3 17:22:08 2015 +1000

    ctdb-scripts: New consistent system memory and swap monitoring
    
    New variables CTDB_MONITOR_MEMORY_USAGE and CTDB_MONITOR_SWAP_USAGE.
    Both take a pair of <warn_threshold>:<unhealthy_threshold> where each
    theshold is specified as a percentage.
    
    This adds a callout to check_thresholds() that is run when the
    unhealthy threshold is reached.
    
    Add some combination tests.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 02fa6c3d106e8fbf0e685afafa5e6a9bc0c3d22d
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Aug 3 16:20:40 2015 +1000

    ctdb-scripts: Factor out new function check_thresholds()
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit b7b6e25b3e26210ed196be7fc5848e3320b5c35b
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Aug 3 15:59:50 2015 +1000

    ctdb-scripts: Memory monitoring uses thresholds expressed as percentages
    
    CTDB_MONITOR_FREE_MEMORY and CTDB_MONITOR_FREE_MEMORY_WARN are now
    percentages that specify thresholds of acceptable memory usage.
    
    Memory/swap usage in tests also specified as percentages.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit bd2845d7ebe9e2970d4d5546e51c79c9b40ce9cb
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 24 19:57:42 2015 +1000

    ctdb-scripts: Use MemAvailable if it is in /proc/meminfo
    
    Otherwise calculate, as before.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 99b8ef512162570504689b53adb14a52233f49b7
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jul 20 20:50:56 2015 +1000

    ctdb-scripts: Only use /proc/meminfo for memory checks, not "free"
    
    No need to use 2 different sources of information for similar checks.
    Also, output of free has been changed, whereas /proc/meminfo is a
    kernel API, which will not change.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit ab58c7abd9c49325c3cee1e7178d04a3034e57d8
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Jul 20 16:08:13 2015 +1000

    ctdb-scripts: Move system memory checking to 05.system
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit b27ff251aff6d7c5c59dbe9b1748b30587402aa3
Author: Martin Schwenke <martin at meltin.net>
Date:   Thu Aug 20 11:47:19 2015 +1000

    ctdb-tests: Remove unwanted trailing whitespace
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 23acbd2f4b0079d1fab01a7dad135e3451efd6d7
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 17 21:32:01 2015 +1000

    ctdb-tests: Add tests for filesystem usage monitoring
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit fa1050690bd28cac8bc99047a900caf2e5fca22f
Author: Martin Schwenke <martin at meltin.net>
Date:   Mon Aug 3 14:56:40 2015 +1000

    ctdb-scripts: New configuration variable CTDB_MONITOR_FILESYSTEM_USAGE
    
    This allows both errors (i.e. unhealthy) and warnings for different
    thresholds.  It replaces CTDB_CHECK_FS_USE.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 8f713c87c1359ef8780018718f6fa47bb0fa82a7
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 24 19:56:06 2015 +1000

    ctdb-scripts: Don't fail monitoring if sanity checks fail
    
    Just log some warnings.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 6b4a46e5742732d7cbdf911b74ab0bb1fc8e3b97
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 17 20:04:44 2015 +1000

    ctdb-scripts: Move filesystem monitoring into a function, clean it up
    
    Drop obvious comments.  Use die() for less lines of code.  Use a case
    statement to avoid forking unnecessary processes for each filesystem
    being checked.  Drop parentheses around percentages in messages.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

commit 47f7d1b1c8432ffdfb71176cf64cdd31e188e59c
Author: Martin Schwenke <martin at meltin.net>
Date:   Fri Jul 17 11:59:56 2015 +1000

    ctdb-scripts: Rename 40.fs_use to 05.system
    
    Will put all the system monitoring in here, simplifying 00.ctdb.
    
    Signed-off-by: Martin Schwenke <martin at meltin.net>
    Reviewed-by: Amitay Isaacs <amitay at gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 ctdb/config/events.d/00.ctdb                     |  43 ------
 ctdb/config/events.d/05.system                   | 176 +++++++++++++++++++++++
 ctdb/config/events.d/40.fs_use                   |  55 -------
 ctdb/doc/ctdbd.conf.5.xml                        |  92 ++++++------
 ctdb/packaging/RPM/ctdb.spec.in                  |   2 +-
 ctdb/tests/eventscripts/00.ctdb.monitor.001.sh   |  15 --
 ctdb/tests/eventscripts/00.ctdb.monitor.002.sh   |  15 --
 ctdb/tests/eventscripts/00.ctdb.monitor.003.sh   |  19 ---
 ctdb/tests/eventscripts/00.ctdb.monitor.004.sh   |  17 ---
 ctdb/tests/eventscripts/00.ctdb.monitor.005.sh   |  21 ---
 ctdb/tests/eventscripts/05.system.monitor.001.sh |  14 ++
 ctdb/tests/eventscripts/05.system.monitor.002.sh |  12 ++
 ctdb/tests/eventscripts/05.system.monitor.003.sh |  14 ++
 ctdb/tests/eventscripts/05.system.monitor.004.sh |  12 ++
 ctdb/tests/eventscripts/05.system.monitor.005.sh |  14 ++
 ctdb/tests/eventscripts/05.system.monitor.006.sh |  14 ++
 ctdb/tests/eventscripts/05.system.monitor.007.sh |  12 ++
 ctdb/tests/eventscripts/05.system.monitor.011.sh |  16 +++
 ctdb/tests/eventscripts/05.system.monitor.012.sh |  14 ++
 ctdb/tests/eventscripts/05.system.monitor.013.sh |  19 +++
 ctdb/tests/eventscripts/05.system.monitor.014.sh |  16 +++
 ctdb/tests/eventscripts/05.system.monitor.015.sh |  18 +++
 ctdb/tests/eventscripts/05.system.monitor.016.sh |  16 +++
 ctdb/tests/eventscripts/05.system.monitor.017.sh |  40 ++++++
 ctdb/tests/eventscripts/05.system.monitor.018.sh | 123 ++++++++++++++++
 ctdb/tests/eventscripts/scripts/local.sh         |  60 +++++---
 ctdb/tests/eventscripts/stubs/df                 |  38 +++++
 ctdb/tests/eventscripts/stubs/free               |   9 --
 ctdb/tests/eventscripts/stubs/ps                 |   2 +-
 29 files changed, 653 insertions(+), 265 deletions(-)
 create mode 100755 ctdb/config/events.d/05.system
 delete mode 100644 ctdb/config/events.d/40.fs_use
 delete mode 100755 ctdb/tests/eventscripts/00.ctdb.monitor.001.sh
 delete mode 100755 ctdb/tests/eventscripts/00.ctdb.monitor.002.sh
 delete mode 100755 ctdb/tests/eventscripts/00.ctdb.monitor.003.sh
 delete mode 100755 ctdb/tests/eventscripts/00.ctdb.monitor.004.sh
 delete mode 100755 ctdb/tests/eventscripts/00.ctdb.monitor.005.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.001.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.002.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.003.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.004.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.005.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.006.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.007.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.011.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.012.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.013.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.014.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.015.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.016.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.017.sh
 create mode 100755 ctdb/tests/eventscripts/05.system.monitor.018.sh
 create mode 100755 ctdb/tests/eventscripts/stubs/df
 delete mode 100755 ctdb/tests/eventscripts/stubs/free


Changeset truncated at 500 lines:

diff --git a/ctdb/config/events.d/00.ctdb b/ctdb/config/events.d/00.ctdb
index 0e25e50..da7186f 100755
--- a/ctdb/config/events.d/00.ctdb
+++ b/ctdb/config/events.d/00.ctdb
@@ -116,46 +116,6 @@ set_ctdb_variables ()
     done
 }
 
-monitor_system_memory ()
-{
-    # If monitoring free memory then calculate how much there is
-    if [ -n "$CTDB_MONITOR_FREE_MEMORY_WARN" -o \
-	-n "$CTDB_MONITOR_FREE_MEMORY" ] ; then
-	free_mem=$(free -m | awk '$2 == "buffers/cache:" { print $4 }')
-    fi
-
-    # Shutdown CTDB when memory is below the configured limit
-    if [ -n "$CTDB_MONITOR_FREE_MEMORY" ] ; then
-	if [ $free_mem -le $CTDB_MONITOR_FREE_MEMORY ] ; then
-	    echo "CRITICAL: OOM - ${free_mem}MB free <= ${CTDB_MONITOR_FREE_MEMORY}MB (CTDB threshold)"
-	    echo "CRITICAL: Shutting down CTDB!!!"
-	    get_proc "meminfo"
-	    ps auxfww
-	    set_proc "sysrq-trigger" "m"
-	    ctdb disable
-	    sleep 3
-	    ctdb shutdown
-	fi
-    fi
-
-    # Warn when low on memory
-    if [ -n "$CTDB_MONITOR_FREE_MEMORY_WARN" ] ; then
-	if [ $free_mem -le $CTDB_MONITOR_FREE_MEMORY_WARN ] ; then
-	    echo "WARNING: free memory is low - ${free_mem}MB free <=  ${CTDB_MONITOR_FREE_MEMORY_WARN}MB (CTDB threshold)"
-	fi
-    fi
-
-    # We should never enter swap, so SwapTotal == SwapFree.
-    if [ "$CTDB_CHECK_SWAP_IS_NOT_USED" = "yes" ] ; then
-	set -- $(get_proc "meminfo" | awk '$1 ~ /Swap(Total|Free):/ { print $2 }')
-	if [ "$1" != "$2" ] ; then
-	    echo We are swapping:
-	    get_proc "meminfo"
-	    ps auxfww
-	fi
-    fi
-}
-
 ############################################################
 
 ctdb_check_args "$@"
@@ -187,9 +147,6 @@ case "$1" in
     startup)
 	ctdb attach ctdb.tdb persistent
 	;;
-    monitor)
-	monitor_system_memory
-	;;
 
     *)
 	ctdb_standard_event_handler "$@"
diff --git a/ctdb/config/events.d/05.system b/ctdb/config/events.d/05.system
new file mode 100755
index 0000000..69fcec2
--- /dev/null
+++ b/ctdb/config/events.d/05.system
@@ -0,0 +1,176 @@
+#!/bin/sh
+# ctdb event script for checking local file system utilization
+
+[ -n "$CTDB_BASE" ] || \
+    export CTDB_BASE=$(cd -P $(dirname "$0") ; dirname "$PWD")
+
+. $CTDB_BASE/functions
+loadconfig
+
+ctdb_setup_service_state_dir "system-monitoring"
+
+validate_percentage ()
+{
+    case "$1" in
+	"") return 1 ;;  # A failure that doesn't need a warning
+	[0-9]|[0-9][0-9]|100) return 0 ;;
+	*) echo "WARNING: ${1} is an invalid percentage${2:+ in \"}${2}${2:+\"} check"
+	   return 1
+    esac
+}
+
+check_thresholds ()
+{
+    _thing="$1"
+    _thresholds="$2"
+    _usage="$3"
+    _unhealthy_callout="$4"
+
+    case "$_thresholds" in
+	*:*)
+	    _warn_threshold="${_thresholds%:*}"
+	    _unhealthy_threshold="${_thresholds#*:}"
+	    ;;
+	*)
+	    _warn_threshold="$_thresholds"
+	    _unhealthy_threshold=""
+    esac
+
+    _t=$(echo "$_thing" | sed -e 's@/@SLASH_ at g' -e 's@ @_ at g')
+    _cache="${service_state_dir}/cache_${_t}"
+    if validate_percentage "$_unhealthy_threshold" "$_thing" ; then
+        if [ "$_usage" -ge "$_unhealthy_threshold" ] ; then
+	    echo "ERROR: ${_thing} utilization ${_usage}% >= threshold ${_unhealthy_threshold}%"
+	    eval "$_unhealthy_callout"
+	    echo "$_usage" >"$_cache"
+	    exit 1
+        fi
+    fi
+
+    if validate_percentage "$_warn_threshold" "$_what" ; then
+        if [ "$_usage" -ge "$_warn_threshold" ] ; then
+	    if [ -r "$_cache" ] ; then
+		read _prev <"$_cache"
+	    else
+		_prev=""
+	    fi
+	    if [ "$_usage" != "$_prev" ] ; then
+		echo "WARNING: ${_thing} utilization ${_usage}% >= threshold ${_warn_threshold}%"
+		echo "$_usage" >"$_cache"
+	    fi
+	else
+	    if [ -r "$_cache" ] ; then
+		echo "NOTICE: ${_thing} utilization ${_usage}% < threshold ${_warn_threshold}%"
+	    fi
+	    rm -f "$_cache"
+        fi
+    fi
+}
+
+set_monitor_filsystem_usage_defaults ()
+{
+    _fs_defaults_cache="${service_state_dir}/cache_monitor_filsystem_usage_defaults"
+
+    if [ ! -r "$_fs_defaults_cache" ] ; then
+	# Determine filesystem for each database directory, generate
+	# an entry to warn at 90%, de-duplicate entries, put all items
+	# on 1 line (so the read below gets everything)
+	for _t in "${CTDB_DBDIR:-${CTDB_VARDIR}}" \
+		      "${CTDB_DBDIR_PERSISTENT:-${CTDB_VARDIR}/persistent}" \
+		      "${CTDB_DBDIR_STATE:-${CTDB_VARDIR}/state}" ; do
+	    df -kP "$_t" | awk 'NR == 2 { printf "%s:90\n", $6 }'
+	done | sort -u | xargs >"$_fs_defaults_cache"
+    fi
+
+    read CTDB_MONITOR_FILESYSTEM_USAGE <"$_fs_defaults_cache"
+}
+
+monitor_filesystem_usage ()
+{
+    if [ -z "$CTDB_MONITOR_FILESYSTEM_USAGE" ] ; then
+	set_monitor_filsystem_usage_defaults
+    fi
+
+    # Check each specified filesystem, specified in format
+    # <fs_mount>:<fs_warn_threshold>[:fs_unhealthy_threshold]
+    for _fs in $CTDB_MONITOR_FILESYSTEM_USAGE ; do
+	_fs_mount="${_fs%%:*}"
+	_fs_thresholds="${_fs#*:}"
+
+        if [ ! -d "$_fs_mount" ]; then
+            echo "WARNING: Directory ${_fs_mount} does not exist"
+	    continue
+        fi
+
+        # Get current utilization
+        _fs_usage=$(df -kP "$_fs_mount" | \
+			   sed -n -e 's at .*[[:space:]]\([[:digit:]]*\)%.*@\1 at p')
+        if [ -z "$_fs_usage" ] ; then
+            echo "WARNING: Unable to get FS utilization for ${_fs_mount}"
+	    continue
+        fi
+
+	check_thresholds "Filesystem ${_fs_mount}" \
+			 "$_fs_thresholds" \
+			 "$_fs_usage"
+    done
+}
+
+dump_memory_info ()
+{
+    get_proc "meminfo"
+    ps auxfww
+    set_proc "sysrq-trigger" "m"
+}
+
+monitor_memory_usage ()
+{
+    # Defaults
+    if [ -z "$CTDB_MONITOR_MEMORY_USAGE" ] ; then
+	CTDB_MONITOR_MEMORY_USAGE=80
+    fi
+    if [ -z "$CTDB_MONITOR_SWAP_USAGE" ] ; then
+	CTDB_MONITOR_SWAP_USAGE=25
+    fi
+
+    _meminfo=$(get_proc "meminfo")
+    set -- $(echo "$_meminfo" | awk '
+$1 == "MemAvailable:" { memavail += $2 }
+$1 == "MemFree:"      { memfree  += $2 }
+$1 == "Cached:"       { memfree  += $2 }
+$1 == "Buffers:"      { memfree  += $2 }
+$1 == "MemTotal:"     { memtotal  = $2 }
+$1 == "SwapFree:"     { swapfree  = $2 }
+$1 == "SwapTotal:"    { swaptotal = $2 }
+END {
+    if (memavail != 0) { memfree = memavail ; }
+    print int((memtotal -  memfree)  / memtotal * 100),
+          int((swaptotal - swapfree) / swaptotal * 100)
+}')
+    _mem_usage="$1"
+    _swap_usage="$2"
+
+    check_thresholds "System memory" \
+		     "$CTDB_MONITOR_MEMORY_USAGE" \
+		     "$_mem_usage" \
+		     dump_memory_info
+
+    check_thresholds "System swap" \
+		     "$CTDB_MONITOR_SWAP_USAGE" \
+		     "$_swap_usage" \
+		     dump_memory_info
+}
+
+
+case "$1" in
+    monitor)
+	monitor_filesystem_usage
+	monitor_memory_usage
+	;;
+
+    *)
+	ctdb_standard_event_handler "$@"
+	;;
+esac
+
+exit 0
diff --git a/ctdb/config/events.d/40.fs_use b/ctdb/config/events.d/40.fs_use
deleted file mode 100644
index 603b463..0000000
--- a/ctdb/config/events.d/40.fs_use
+++ /dev/null
@@ -1,55 +0,0 @@
-#!/bin/sh
-# ctdb event script for checking local file system utilization
-
-[ -n "$CTDB_BASE" ] || \
-    export CTDB_BASE=$(cd -P $(dirname "$0") ; dirname "$PWD")
-
-. $CTDB_BASE/functions
-loadconfig
-
-case "$1" in 
-    monitor)
-        # check each specified fs to be checked
-        # config format is <fs_mount>:<fs_threshold>
-        for fs in $CTDB_CHECK_FS_USE
-        do
-            # parse fs_mount and fs_threshold
-            fs_mount="${fs%:*}"
-            fs_threshold="${fs#*:}"
-
-            # check if given fs_mount is existing directory
-            if [ ! -d "$fs_mount" ]; then
-                echo "Directory $fs_mount does not exist"
-                exit 1
-            fi
-
-            # check if given fs_threshold is number
-            if ! (echo "$fs_threshold" | egrep -q '^[0-9]+$')  ; then
-                echo "Threshold $fs_threshold is invalid number"
-                exit 1
-            fi
-
-            # get utilization of given fs from df
-            fs_usage=$(df -kP $fs_mount | sed -n -e 's at .*[[:space:]]\([[:digit:]]*\)%.*@\1 at p')
-
-            # check if fs_usage is number
-            if [ -z "$fs_usage" ] ; then
-                echo "Unable to get FS utilization for $fs_mount"
-                exit 1
-            fi
-
-            # check if fs_usage is higher than or equal to fs_threshold
-            if [ "$fs_usage" -ge "$fs_threshold" ] ; then
-                echo "ERROR: Utilization of $fs_mount ($fs_usage%) is higher than threshold ($fs_threshold%)"
-                exit 1
-            fi
-        done
-
-	;;
-
-    *)
-	ctdb_standard_event_handler "$@"
-	;;
-esac
-
-exit 0
diff --git a/ctdb/doc/ctdbd.conf.5.xml b/ctdb/doc/ctdbd.conf.5.xml
index da53e51..f45c724 100644
--- a/ctdb/doc/ctdbd.conf.5.xml
+++ b/ctdb/doc/ctdbd.conf.5.xml
@@ -1279,91 +1279,91 @@ CTDB_PER_IP_ROUTING_TABLE_ID_HIGH=9000
 
       <para>
 	CTDB can experience seemingly random (performance and other)
-	issues if system resources become too contrained.  Options in
-	this section can be enabled to allow certain system resources to
-	be checked.
+	issues if system resources become too constrained.  Options in
+	this section can be enabled to allow certain system resources
+	to be checked.  They allows warnings to be logged and nodes to
+	be marked unhealthy when system resource usage reaches the
+	configured thresholds.
+      </para>
+
+      <para>
+	Some checks are enabled by default.  It is recommended that
+	these checks remain enabled or are augmented by extra checks.
+	There is no supported way of completely disabling the checks.
       </para>
 
       <refsect3>
 	<title>Eventscripts</title>
 
 	<simplelist>
-	  <member><filename>00.ctdb</filename></member>
-	  <member><filename>40.fs_use</filename></member>
+	  <member><filename>05.system</filename></member>
 	</simplelist>
 
 	<para>
-	  Filesystem usage monitoring is in
-	  <filename>40.fs_use</filename>.  This eventscript is not
-	  enabled by default.  Use <command>ctdb
-	  enablescript</command> to enable it.
+	  Filesystem and memory usage monitoring is in
+	  <filename>05.system</filename>.
 	</para>
       </refsect3>
 
       <variablelist>
 
 	<varlistentry>
-	  <term>CTDB_CHECK_FS_USE=<parameter>FS-LIMIT-LIST</parameter></term>
+	  <term>CTDB_MONITOR_FILESYSTEM_USAGE=<parameter>FS-LIMIT-LIST</parameter></term>
 	  <listitem>
 	    <para>
 	      FS-LIMIT-LIST is a space-separated list of
-	      <parameter>FILESYSTEM</parameter>:<parameter>LIMIT</parameter>
-	      pairs indicating that a node should be flagged unhealthy
-	      if the space used on FILESYSTEM reaches LIMIT%.
-	    </para>
-
-	    <para>
-	      No default.
+	      <parameter>FILESYSTEM</parameter>:<parameter>WARN_LIMIT</parameter><optional>:<parameter>UNHEALTHY_LIMIT</parameter></optional>
+	      triples indicating that warnings should be logged if the
+	      space used on FILESYSTEM reaches WARN_LIMIT%.  If usage
+	      reaches UNHEALTHY_LIMIT then the node should be flagged
+	      unhealthy.  Either WARN_LIMIT or UNHEALTHY_LIMIT may be
+	      left blank, meaning that check will be omitted.
 	    </para>
 
 	    <para>
-	      Note that this feature uses the
-	      <filename>40.fs_use</filename> eventscript, which is not
-	      enabled by default.  Use <command>ctdb
-	      enablescript</command> to enable it.
+	      Default is to warn for each filesystem containing a
+	      database directory (<envar>CTDB_DBDIR</envar>,
+	      <envar>CTDB_DBDIR_PERSISTENT</envar>,
+	      <envar>CTDB_DBDIR_STATE</envar>) with a threshold of
+	      90%.
 	    </para>
 	  </listitem>
 	</varlistentry>
 
 	<varlistentry>
-	  <term>CTDB_CHECK_SWAP_IS_NOT_USED=yes|no</term>
+	  <term>CTDB_MONITOR_MEMORY_USAGE=<parameter>MEM-LIMITS</parameter></term>
 	  <listitem>
 	    <para>
-	      Should a warning be logged if swap space is in use.
+	      MEM-LIMITS takes the form
+	      <parameter>WARN_LIMIT</parameter><optional>:<parameter>UNHEALTHY_LIMIT</parameter></optional>
+	      indicating that warnings should be logged if memory
+	      usage reaches WARN_LIMIT%.  If usage reaches
+	      UNHEALTHY_LIMIT then the node should be flagged
+	      unhealthy.  Either WARN_LIMIT or UNHEALTHY_LIMIT may be
+	      left blank, meaning that check will be omitted.
 	    </para>
 	    <para>
-	      Default is no.
+	      Default is 80, so warnings will be logged when memory
+	      usage reaches 80%.
 	    </para>
 	  </listitem>
 	</varlistentry>
 
 	<varlistentry>
-	  <term>CTDB_MONITOR_FREE_MEMORY=<parameter>NUM</parameter></term>
+	  <term>CTDB_MONITOR_SWAP_USAGE=<parameter>SWAP-LIMITS</parameter></term>
 	  <listitem>
 	    <para>
-	      NUM is a lower limit on available system memory, expressed
-	      in megabytes.  If this is set and the amount of available
-	      memory falls below this limit then some debug information
-	      will be logged, the node will be disabled and then CTDB
-	      will be shut down.
+	      SWAP-LIMITS takes the form
+	      <parameter>WARN_LIMIT</parameter><optional>:<parameter>UNHEALTHY_LIMIT</parameter></optional>
+	       indicating that warnings should be logged if
+	      swap usage reaches WARN_LIMIT%.  If usage reaches
+	      UNHEALTHY_LIMIT then the node should be flagged
+	      unhealthy.  Either WARN_LIMIT or UNHEALTHY_LIMIT may be
+	      left blank, meaning that check will be omitted.
 	    </para>
 	    <para>
-	      No default.
-	    </para>
-	  </listitem>
-	</varlistentry>
-
-	<varlistentry>
-	  <term>CTDB_MONITOR_FREE_MEMORY_WARN=<parameter>NUM</parameter></term>
-	  <listitem>
-	    <para>
-	      NUM is a lower limit on available system memory, expressed
-	      in megabytes.  If this is set and the amount of available
-	      memory falls below this limit then a warning will be
-	      logged.
-	    </para>
-	    <para>
-	      No default.
+	      Default is 25, so warnings will be logged when swap
+	      usage reaches 25%.
 	    </para>
 	  </listitem>
 	</varlistentry>
diff --git a/ctdb/packaging/RPM/ctdb.spec.in b/ctdb/packaging/RPM/ctdb.spec.in
index 00f0be5..318dacf 100644
--- a/ctdb/packaging/RPM/ctdb.spec.in
+++ b/ctdb/packaging/RPM/ctdb.spec.in
@@ -167,6 +167,7 @@ rm -rf $RPM_BUILD_ROOT
 %{_sysconfdir}/ctdb/functions
 %{_sysconfdir}/ctdb/events.d/00.ctdb
 %{_sysconfdir}/ctdb/events.d/01.reclock
+%{_sysconfdir}/ctdb/events.d/05.system
 %{_sysconfdir}/ctdb/events.d/10.interface
 %{_sysconfdir}/ctdb/events.d/10.external
 %{_sysconfdir}/ctdb/events.d/13.per_ip_routing
@@ -174,7 +175,6 @@ rm -rf $RPM_BUILD_ROOT
 %{_sysconfdir}/ctdb/events.d/11.routing
 %{_sysconfdir}/ctdb/events.d/20.multipathd
 %{_sysconfdir}/ctdb/events.d/31.clamd
-%{_sysconfdir}/ctdb/events.d/40.fs_use
 %{_sysconfdir}/ctdb/events.d/40.vsftpd
 %{_sysconfdir}/ctdb/events.d/41.httpd
 %{_sysconfdir}/ctdb/events.d/49.winbind
diff --git a/ctdb/tests/eventscripts/00.ctdb.monitor.001.sh b/ctdb/tests/eventscripts/00.ctdb.monitor.001.sh
deleted file mode 100755
index 4290d13..0000000
--- a/ctdb/tests/eventscripts/00.ctdb.monitor.001.sh
+++ /dev/null
@@ -1,15 +0,0 @@
-#!/bin/sh
-
-. "${TEST_SCRIPTS_DIR}/unit.sh"
-
-define_test "Memory check, bad situation, no checks enabled"
-
-setup_memcheck "bad"
-
-CTDB_MONITOR_FREE_MEMORY=""
-CTDB_MONITOR_FREE_MEMORY_WARN=""
-CTDB_CHECK_SWAP_IS_NOT_USED="no"
-
-ok_null
-
-simple_test
diff --git a/ctdb/tests/eventscripts/00.ctdb.monitor.002.sh b/ctdb/tests/eventscripts/00.ctdb.monitor.002.sh
deleted file mode 100755
index 6e94012..0000000
--- a/ctdb/tests/eventscripts/00.ctdb.monitor.002.sh
+++ /dev/null
@@ -1,15 +0,0 @@
-#!/bin/sh
-
-. "${TEST_SCRIPTS_DIR}/unit.sh"
-
-define_test "Memory check, good situation, all enabled"
-
-setup_memcheck


-- 
Samba Shared Repository



More information about the samba-cvs mailing list