[Samba] CDTB On Samba 4.1.12 As Member files server.

Wed Oct 8 13:56:37 MDT 2014

Update

I think my distro didn't use ctdbd_wrapper
I've to make change to run

ctdbd_wrapper /var/run/ctdbd.pif start

Than the Node 1 will be ok (some changes also need to be made as not sure
which is the default config files read as ctdbd setting)
Testing show /etc/ctdb/ctdbd.conf can be use -- yet to confirm...

But still

it will now be OK (only node1)

But I cannot get node 2 to work together.

Number of nodes:2
pnn:0 192.168.20.30    OK (THIS NODE)
pnn:1 192.168.20.34    DISCONNECTED|UNHEALTHY|INACTIVE
Generation:178185337
Size:1
hash:0 lmaster:0
Recovery mode:NORMAL (0)
Recovery master:0

Node1 syslog
2014/10/09 03:52:15.541982 [ 8702]: 192.168.20.30:4379: connected to
192.168.20.34:4379 - 1 connected
2014/10/09 03:52:17.441753 [recoverd: 8900]: Election period ended
2014/10/09 03:52:17.763530 [recoverd: 8900]: Node:1 was in recovery mode.
Start recovery process
2014/10/09 03:52:17.763927 [recoverd: 8900]: server/ctdb_recoverd.c:1797
Starting do_recovery
2014/10/09 03:52:17.764132 [recoverd: 8900]: Taking out recovery lock from
recovery daemon
2014/10/09 03:52:17.764311 [recoverd: 8900]: Take the recovery lock
2014/10/09 03:52:17.764533 [recoverd: 8900]: Recovery lock taken
successfully
2014/10/09 03:52:17.764710 [recoverd: 8900]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:17.764961 [recoverd: 8900]: Recovery lock taken
successfully by recovery daemon
2014/10/09 03:52:17.765148 [recoverd: 8900]: server/ctdb_recoverd.c:1838
Recovery initiated due to problem with node 0
2014/10/09 03:52:17.786866 [recoverd: 8900]: server/ctdb_recoverd.c:1863
Recovery - created remote databases
2014/10/09 03:52:17.789170 [recoverd: 8900]: server/ctdb_recoverd.c:1870
Recovery - updated db priority for all databases
2014/10/09 03:52:17.789872 [ 8702]: server/ctdb_recover.c:612 Recovery mode
set to ACTIVE
2014/10/09 03:52:17.790502 [ 8702]: Freeze priority 1
2014/10/09 03:52:17.792882 [ 8702]: Freeze priority 2
2014/10/09 03:52:17.794584 [ 8702]: Freeze priority 3
2014/10/09 03:52:17.796310 [ 8702]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:18.081238 [recoverd: 8900]: server/ctdb_recoverd.c:1911
Recovery - updated flags
2014/10/09 03:52:18.104017 [recoverd: 8900]: server/ctdb_recoverd.c:1955
started transactions on all nodes
2014/10/09 03:52:18.106845 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x5bcfcbd7
2014/10/09 03:52:18.125134 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x5bcfcbd7 of size 0
2014/10/09 03:52:18.128277 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x66f71b8c
2014/10/09 03:52:18.144294 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x66f71b8c of size 0
2014/10/09 03:52:18.147285 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0xaf029e9d
2014/10/09 03:52:18.163202 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0xaf029e9d of size 0
2014/10/09 03:52:18.166250 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x7a19d84d
2014/10/09 03:52:18.181950 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x7a19d84d of size 0
2014/10/09 03:52:18.184881 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x4e66c2b2
2014/10/09 03:52:18.200581 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x4e66c2b2 of size 0
2014/10/09 03:52:18.203488 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x68c12c2c
2014/10/09 03:52:18.219271 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x68c12c2c of size 0
2014/10/09 03:52:18.222195 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x6b06a26d
2014/10/09 03:52:18.238171 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x6b06a26d of size 0
2014/10/09 03:52:18.241352 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x521b7544
2014/10/09 03:52:18.257686 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x521b7544 of size 1
2014/10/09 03:52:18.261019 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x6afb8c09
2014/10/09 03:52:18.277863 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x6afb8c09 of size 0
2014/10/09 03:52:18.280802 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x4d2a432b
2014/10/09 03:52:18.296439 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x4d2a432b of size 0
2014/10/09 03:52:18.299580 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x9ec2a880
2014/10/09 03:52:18.315852 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x9ec2a880 of size 5
2014/10/09 03:52:18.317798 [recoverd: 8900]: Scan for highest seqnum pdb
for db:0x3ef19640
2014/10/09 03:52:18.318597 [recoverd: 8900]: Pull persistent db:0x3ef19640
from node 0 with highest seqnum:1
2014/10/09 03:52:18.319381 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x3ef19640
2014/10/09 03:52:18.335282 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x3ef19640 of size 4
2014/10/09 03:52:18.337221 [recoverd: 8900]: Scan for highest seqnum pdb
for db:0xc3078fba
2014/10/09 03:52:18.337984 [recoverd: 8900]: Pull persistent db:0xc3078fba
from node 0 with highest seqnum:1
2014/10/09 03:52:18.338747 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0xc3078fba
2014/10/09 03:52:18.357283 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0xc3078fba of size 2
2014/10/09 03:52:18.359211 [recoverd: 8900]: Scan for highest seqnum pdb
for db:0xc3db117c
2014/10/09 03:52:18.360922 [recoverd: 8900]: Pull persistent db:0xc3db117c
from node 0 with highest seqnum:6
2014/10/09 03:52:18.361739 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0xc3db117c
2014/10/09 03:52:18.378346 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0xc3db117c of size 8
2014/10/09 03:52:18.380296 [recoverd: 8900]: Scan for highest seqnum pdb
for db:0x6645c6c4
2014/10/09 03:52:18.380955 [recoverd: 8900]: Pull persistent db:0x6645c6c4
from node 0 with highest seqnum:0
2014/10/09 03:52:18.381754 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x6645c6c4
2014/10/09 03:52:18.390880 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x6645c6c4 of size 1
2014/10/09 03:52:18.393034 [recoverd: 8900]: Scan for highest seqnum pdb
for db:0x6cf2837d
2014/10/09 03:52:18.393691 [recoverd: 8900]: Pull persistent db:0x6cf2837d
from node 0 with highest seqnum:4
2014/10/09 03:52:18.394926 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x6cf2837d
2014/10/09 03:52:18.404845 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x6cf2837d of size 73
2014/10/09 03:52:18.406784 [recoverd: 8900]: Scan for highest seqnum pdb
for db:0x7132c184
2014/10/09 03:52:18.407786 [recoverd: 8900]: Pull persistent db:0x7132c184
from node 0 with highest seqnum:6
2014/10/09 03:52:18.408604 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x7132c184
2014/10/09 03:52:18.417955 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x7132c184 of size 7
2014/10/09 03:52:18.419909 [recoverd: 8900]: Scan for highest seqnum pdb
for db:0xa1413774
2014/10/09 03:52:18.420560 [recoverd: 8900]: Pull persistent db:0xa1413774
from node 0 with highest seqnum:4
2014/10/09 03:52:18.421332 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0xa1413774
2014/10/09 03:52:18.430540 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0xa1413774 of size 5
2014/10/09 03:52:18.432449 [recoverd: 8900]: Scan for highest seqnum pdb
for db:0x2ca251cf
2014/10/09 03:52:18.433050 [recoverd: 8900]: Pull persistent db:0x2ca251cf
from node 0 with highest seqnum:1
2014/10/09 03:52:18.433909 [recoverd: 8900]: server/ctdb_recoverd.c:1485
Recovery - pulled remote database 0x2ca251cf
2014/10/09 03:52:18.443101 [recoverd: 8900]: server/ctdb_recoverd.c:1446
Recovery - pushed remote database 0x2ca251cf of size 18
2014/10/09 03:52:18.443525 [recoverd: 8900]: server/ctdb_recoverd.c:1968
Recovery - starting database commits

Node2

2014/10/09 03:52:13.259556 [28542]: CTDB starting on node
2014/10/09 03:52:13.261874 [28543]: Starting CTDBD (Version 2.5.4) as PID:
28543
2014/10/09 03:52:13.262128 [28543]: Created PID file /var/run/ctdbd.pif
2014/10/09 03:52:13.262194 [28543]: Set real-time scheduler priority
2014/10/09 03:52:13.262386 [28543]: Set runstate to INIT (1)
2014/10/09 03:52:13.996100 [28543]: Vacuuming is disabled for persistent
database passdb.tdb
2014/10/09 03:52:13.996179 [28543]: Attached to database
'/var/lib/ctdb/persistent/passdb.tdb.1' with flags 0x400
2014/10/09 03:52:14.011700 [28543]: Vacuuming is disabled for persistent
database ctdb.tdb
2014/10/09 03:52:14.011741 [28543]: Attached to database
'/var/lib/ctdb/persistent/ctdb.tdb.1' with flags 0x400
2014/10/09 03:52:14.029983 [28543]: Vacuuming is disabled for persistent
database account_policy.tdb
2014/10/09 03:52:14.030024 [28543]: Attached to database
'/var/lib/ctdb/persistent/account_policy.tdb.1' with flags 0x400
2014/10/09 03:52:14.044705 [28543]: Vacuuming is disabled for persistent
database winbindd_idmap.tdb
2014/10/09 03:52:14.044740 [28543]: Attached to database
'/var/lib/ctdb/persistent/winbindd_idmap.tdb.1' with flags 0x400
2014/10/09 03:52:14.059368 [28543]: Vacuuming is disabled for persistent
database group_mapping.tdb
2014/10/09 03:52:14.059402 [28543]: Attached to database
'/var/lib/ctdb/persistent/group_mapping.tdb.1' with flags 0x400
2014/10/09 03:52:14.074023 [28543]: Vacuuming is disabled for persistent
database share_info.tdb
2014/10/09 03:52:14.074109 [28543]: Attached to database
'/var/lib/ctdb/persistent/share_info.tdb.1' with flags 0x400
2014/10/09 03:52:14.088781 [28543]: Vacuuming is disabled for persistent
database registry.tdb
2014/10/09 03:52:14.088814 [28543]: Attached to database
'/var/lib/ctdb/persistent/registry.tdb.1' with flags 0x400
2014/10/09 03:52:14.103480 [28543]: Vacuuming is disabled for persistent
database secrets.tdb
2014/10/09 03:52:14.103514 [28543]: Attached to database
'/var/lib/ctdb/persistent/secrets.tdb.1' with flags 0x400
2014/10/09 03:52:14.103561 [28543]: Freeze priority 1
2014/10/09 03:52:14.105395 [28543]: Freeze priority 2
2014/10/09 03:52:14.106585 [28543]: Freeze priority 3
2014/10/09 03:52:14.107879 [28543]: server/ctdb_takeover.c:3303 Released 0
public IPs
2014/10/09 03:52:14.107913 [28543]: Set runstate to SETUP (2)
2014/10/09 03:52:14.142683 [28543]: 00.ctdb: Set TraverseTimeout to 60
2014/10/09 03:52:14.310296 [28543]: Unknown db_id 0x6afb8c09 in
ctdb_set_db_priority
2014/10/09 03:52:14.310362 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.310385 [28543]: Unknown db_id 0x9ec2a880 in
ctdb_set_db_priority
2014/10/09 03:52:14.310416 [28543]: Unknown db_id 0x9ec2a880 in
ctdb_set_db_priority
2014/10/09 03:52:14.310437 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.310456 [28543]: Unknown db_id 0x9ec2a880 in
ctdb_set_db_priority
2014/10/09 03:52:14.310475 [28543]: Unknown db_id 0x6afb8c09 in
ctdb_set_db_priority
2014/10/09 03:52:14.310495 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.310513 [28543]: Unknown db_id 0x6afb8c09 in
ctdb_set_db_priority
2014/10/09 03:52:14.313298 [28543]: Attached to database
'/var/lib/ctdb/smbXsrv_version_global.tdb.1' with flags 0xc41
2014/10/09 03:52:14.317701 [28543]: Attached to database
'/var/lib/ctdb/smbXsrv_session_global.tdb.1' with flags 0xc41
2014/10/09 03:52:14.321419 [28543]: Attached to database
'/var/lib/ctdb/smbXsrv_tcon_global.tdb.1' with flags 0xc41
2014/10/09 03:52:14.325099 [28543]: Attached to database
'/var/lib/ctdb/brlock.tdb.1' with flags 0xc41
2014/10/09 03:52:14.328674 [28543]: Attached to database
'/var/lib/ctdb/locking.tdb.1' with flags 0xc41
2014/10/09 03:52:14.332342 [28543]: Attached to database
'/var/lib/ctdb/notify_index.tdb.1' with flags 0xc41
2014/10/09 03:52:14.333649 [28543]: Unknown db_id 0x9ec2a880 in
ctdb_set_db_priority
2014/10/09 03:52:14.333683 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.333704 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.333724 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.333743 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.333763 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.333782 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.333802 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.333822 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.333841 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.333860 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.336236 [28543]: Attached to database
'/var/lib/ctdb/smbXsrv_open_global.tdb.1' with flags 0xc41
2014/10/09 03:52:14.337545 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.337578 [28543]: Unknown db_id 0x4d2a432b in
ctdb_set_db_priority
2014/10/09 03:52:14.339873 [28543]: Attached to database
'/var/lib/ctdb/printer_list.tdb.1' with flags 0xc41
2014/10/09 03:52:14.341268 [28543]: Unknown db_id 0x9ec2a880 in
ctdb_set_db_priority
2014/10/09 03:52:14.341305 [28543]: Unknown db_id 0x9ec2a880 in
ctdb_set_db_priority
2014/10/09 03:52:14.436883 [28543]: Keepalive monitoring has been started
2014/10/09 03:52:14.436947 [28543]: Set runstate to FIRST_RECOVERY (3)
2014/10/09 03:52:14.437286 [recoverd:28741]: monitor_cluster starting
2014/10/09 03:52:14.439841 [recoverd:28741]: server/ctdb_recoverd.c:3727
Initial recovery master set - forcing election
2014/10/09 03:52:14.440042 [28543]: Freeze priority 1
2014/10/09 03:52:14.440222 [28543]: Freeze priority 2
2014/10/09 03:52:14.440326 [28543]: Freeze priority 3
2014/10/09 03:52:14.440462 [28543]: This node (1) is now the recovery master
2014/10/09 03:52:14.942388 [28543]: This node (1) is no longer the recovery
master
2014/10/09 03:52:15.437211 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:16.437439 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:17.437689 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:17.766744 [28543]: Could not find idr:2147483462
2014/10/09 03:52:17.770024 [28543]: Attached to database
'/var/lib/ctdb/dbwrap_watchers.tdb.1' with flags 0xc41
2014/10/09 03:52:17.778317 [28543]: Could not find idr:2147483462
2014/10/09 03:52:17.780950 [28543]: Attached to database
'/var/lib/ctdb/g_lock.tdb.1' with flags 0xc41
2014/10/09 03:52:17.782930 [28543]: Could not find idr:2147483462
2014/10/09 03:52:17.785419 [28543]: Attached to database
'/var/lib/ctdb/serverid.tdb.1' with flags 0xc41
2014/10/09 03:52:17.791316 [28543]: Freeze priority 1
2014/10/09 03:52:17.793632 [28543]: Freeze priority 2
2014/10/09 03:52:17.795314 [28543]: Freeze priority 3
2014/10/09 03:52:17.797238 [28543]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:17.945548 [recoverd:28741]: Election period ended
2014/10/09 03:52:17.946048 [recoverd:28741]: server/ctdb_recoverd.c:3749
Current recmaster node 0 does not have CAP_RECMASTER, but we (node 1) have
- force an election
2014/10/09 03:52:17.946161 [28543]: Freeze priority 1
2014/10/09 03:52:17.946240 [28543]: Freeze priority 2
2014/10/09 03:52:17.946315 [28543]: Freeze priority 3
2014/10/09 03:52:17.946439 [28543]: This node (1) is now the recovery master
2014/10/09 03:52:18.080695 [recoverd:28741]: Node 0 has changed flags - now
0x0  was 0x3
2014/10/09 03:52:18.441220 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:18.801260 [28543]: This node (1) is no longer the recovery
master
2014/10/09 03:52:18.804257 [28543]: server/ctdb_recover.c:612 Recovery mode
set to NORMAL
2014/10/09 03:52:18.804294 [28543]: Thawing priority 1
2014/10/09 03:52:18.804312 [28543]: Release freeze handler for prio 1
2014/10/09 03:52:18.804394 [28543]: Thawing priority 2
2014/10/09 03:52:18.804418 [28543]: Release freeze handler for prio 2
2014/10/09 03:52:18.804465 [28543]: Thawing priority 3
2014/10/09 03:52:18.804485 [28543]: Release freeze handler for prio 3
2014/10/09 03:52:18.806283 [set_recmode:28804]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:18.806409 [set_recmode:28804]: ERROR: recovery lock file
/amtb/ctdb.lock not locked when recovering!
2014/10/09 03:52:19.437907 [28543]: 192.168.20.34:4379: connected to
192.168.20.30:4379 - 1 connected
2014/10/09 03:52:19.442076 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:19.442113 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:20.442824 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:20.442896 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:21.443820 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:21.443932 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:21.802502 [recoverd:28741]: Election period ended
2014/10/09 03:52:21.804390 [recoverd:28741]: Initial interface fetched
2014/10/09 03:52:21.804429 [recoverd:28741]: The interfaces status has
changed on local node 1 - force takeover run
2014/10/09 03:52:21.818586 [28543]: Freeze priority 1
2014/10/09 03:52:21.821749 [28543]: Freeze priority 2
2014/10/09 03:52:21.823907 [28543]: Freeze priority 3
2014/10/09 03:52:21.826207 [28543]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:22.109373 [28543]: server/ctdb_monitor.c:495 Node 0 became
healthy - force recovery for startup
2014/10/09 03:52:22.109641 [recoverd:28741]: Node 0 has changed flags - now
0x0  was 0x2
2014/10/09 03:52:22.444573 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:22.444666 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:23.107390 [28543]: server/ctdb_recover.c:612 Recovery mode
set to NORMAL
2014/10/09 03:52:23.107468 [28543]: Thawing priority 1
2014/10/09 03:52:23.107487 [28543]: Release freeze handler for prio 1
2014/10/09 03:52:23.107570 [28543]: Thawing priority 2
2014/10/09 03:52:23.107592 [28543]: Release freeze handler for prio 2
2014/10/09 03:52:23.107638 [28543]: Thawing priority 3
2014/10/09 03:52:23.107658 [28543]: Release freeze handler for prio 3
2014/10/09 03:52:23.109983 [set_recmode:28868]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:23.110179 [set_recmode:28868]: ERROR: recovery lock file
/amtb/ctdb.lock not locked when recovering!
2014/10/09 03:52:23.120469 [28543]: Freeze priority 1
2014/10/09 03:52:23.123076 [28543]: Freeze priority 2
2014/10/09 03:52:23.125300 [28543]: Freeze priority 3
2014/10/09 03:52:23.127593 [28543]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:23.450400 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:23.450491 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:24.452599 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:24.452669 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:24.455363 [28543]: server/ctdb_recover.c:612 Recovery mode
set to NORMAL
2014/10/09 03:52:24.455400 [28543]: Thawing priority 1
2014/10/09 03:52:24.455416 [28543]: Release freeze handler for prio 1
2014/10/09 03:52:24.455493 [28543]: Thawing priority 2
2014/10/09 03:52:24.455516 [28543]: Release freeze handler for prio 2
2014/10/09 03:52:24.455563 [28543]: Thawing priority 3
2014/10/09 03:52:24.455582 [28543]: Release freeze handler for prio 3
2014/10/09 03:52:24.456975 [set_recmode:28932]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:24.457101 [set_recmode:28932]: ERROR: recovery lock file
/amtb/ctdb.lock not locked when recovering!
2014/10/09 03:52:24.464212 [28543]: Freeze priority 1
2014/10/09 03:52:24.467059 [28543]: Freeze priority 2
2014/10/09 03:52:24.469229 [28543]: Freeze priority 3
2014/10/09 03:52:24.471575 [28543]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:25.453562 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:25.453660 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:25.794452 [28543]: server/ctdb_recover.c:612 Recovery mode
set to NORMAL
2014/10/09 03:52:25.794522 [28543]: Thawing priority 1
2014/10/09 03:52:25.794541 [28543]: Release freeze handler for prio 1
2014/10/09 03:52:25.794617 [28543]: Thawing priority 2
2014/10/09 03:52:25.794638 [28543]: Release freeze handler for prio 2
2014/10/09 03:52:25.794685 [28543]: Thawing priority 3
2014/10/09 03:52:25.794704 [28543]: Release freeze handler for prio 3
2014/10/09 03:52:25.797269 [set_recmode:28995]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:25.797396 [set_recmode:28995]: ERROR: recovery lock file
/amtb/ctdb.lock not locked when recovering!
2014/10/09 03:52:25.804660 [28543]: Freeze priority 1
2014/10/09 03:52:25.807280 [28543]: Freeze priority 2
2014/10/09 03:52:25.809433 [28543]: Freeze priority 3
2014/10/09 03:52:25.811738 [28543]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:26.740688 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:26.740759 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:26.751055 [28543]: server/ctdb_recover.c:612 Recovery mode
set to NORMAL
2014/10/09 03:52:26.751093 [28543]: Thawing priority 1
2014/10/09 03:52:26.751109 [28543]: Release freeze handler for prio 1
2014/10/09 03:52:26.751189 [28543]: Thawing priority 2
2014/10/09 03:52:26.751211 [28543]: Release freeze handler for prio 2
2014/10/09 03:52:26.751269 [28543]: Thawing priority 3
2014/10/09 03:52:26.751291 [28543]: Release freeze handler for prio 3
2014/10/09 03:52:26.753852 [set_recmode:29058]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:26.753974 [set_recmode:29058]: ERROR: recovery lock file
/amtb/ctdb.lock not locked when recovering!
2014/10/09 03:52:26.807884 [28543]: Freeze priority 1
2014/10/09 03:52:26.810283 [28543]: Freeze priority 2
2014/10/09 03:52:26.812216 [28543]: Freeze priority 3
2014/10/09 03:52:26.814125 [28543]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:28.072512 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:28.072582 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:28.075765 [28543]: server/ctdb_recover.c:612 Recovery mode
set to NORMAL
2014/10/09 03:52:28.075802 [28543]: Thawing priority 1
2014/10/09 03:52:28.075819 [28543]: Release freeze handler for prio 1
2014/10/09 03:52:28.075899 [28543]: Thawing priority 2
2014/10/09 03:52:28.075922 [28543]: Release freeze handler for prio 2
2014/10/09 03:52:28.075969 [28543]: Thawing priority 3
2014/10/09 03:52:28.075988 [28543]: Release freeze handler for prio 3
2014/10/09 03:52:28.077122 [set_recmode:29121]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:28.077242 [set_recmode:29121]: ERROR: recovery lock file
/amtb/ctdb.lock not locked when recovering!
2014/10/09 03:52:28.085058 [28543]: Freeze priority 1
2014/10/09 03:52:28.087629 [28543]: Freeze priority 2
2014/10/09 03:52:28.089703 [28543]: Freeze priority 3
2014/10/09 03:52:28.092023 [28543]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:29.073440 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:29.073536 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:29.415607 [28543]: server/ctdb_recover.c:612 Recovery mode
set to NORMAL
2014/10/09 03:52:29.415674 [28543]: Thawing priority 1
2014/10/09 03:52:29.415692 [28543]: Release freeze handler for prio 1
2014/10/09 03:52:29.415772 [28543]: Thawing priority 2
2014/10/09 03:52:29.415794 [28543]: Release freeze handler for prio 2
2014/10/09 03:52:29.415843 [28543]: Thawing priority 3
2014/10/09 03:52:29.415862 [28543]: Release freeze handler for prio 3
2014/10/09 03:52:29.416978 [set_recmode:29184]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:29.417108 [set_recmode:29184]: ERROR: recovery lock file
/amtb/ctdb.lock not locked when recovering!
2014/10/09 03:52:29.424282 [28543]: Freeze priority 1
2014/10/09 03:52:29.428067 [28543]: Freeze priority 2
2014/10/09 03:52:29.430137 [28543]: Freeze priority 3
2014/10/09 03:52:29.432367 [28543]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:30.074067 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:30.074161 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:31.372537 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:31.372606 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:31.376640 [28543]: server/ctdb_recover.c:612 Recovery mode
set to NORMAL
2014/10/09 03:52:31.376676 [28543]: Thawing priority 1
2014/10/09 03:52:31.376692 [28543]: Release freeze handler for prio 1
2014/10/09 03:52:31.376777 [28543]: Thawing priority 2
2014/10/09 03:52:31.376799 [28543]: Release freeze handler for prio 2
2014/10/09 03:52:31.376839 [28543]: Thawing priority 3
2014/10/09 03:52:31.376857 [28543]: Release freeze handler for prio 3
2014/10/09 03:52:31.377965 [set_recmode:29247]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:31.378099 [set_recmode:29247]: ERROR: recovery lock file
/amtb/ctdb.lock not locked when recovering!
2014/10/09 03:52:31.385182 [28543]: Freeze priority 1
2014/10/09 03:52:31.387827 [28543]: Freeze priority 2
2014/10/09 03:52:31.389957 [28543]: Freeze priority 3
2014/10/09 03:52:31.392266 [28543]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:32.372700 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:32.372847 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:32.726206 [28543]: server/ctdb_recover.c:612 Recovery mode
set to NORMAL
2014/10/09 03:52:32.726274 [28543]: Thawing priority 1
2014/10/09 03:52:32.726293 [28543]: Release freeze handler for prio 1
2014/10/09 03:52:32.726375 [28543]: Thawing priority 2
2014/10/09 03:52:32.726397 [28543]: Release freeze handler for prio 2
2014/10/09 03:52:32.726455 [28543]: Thawing priority 3
2014/10/09 03:52:32.726476 [28543]: Release freeze handler for prio 3
2014/10/09 03:52:32.727568 [set_recmode:29310]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:32.727693 [set_recmode:29310]: ERROR: recovery lock file
/amtb/ctdb.lock not locked when recovering!
2014/10/09 03:52:32.734783 [28543]: Freeze priority 1
2014/10/09 03:52:32.737319 [28543]: Freeze priority 2
2014/10/09 03:52:32.739390 [28543]: Freeze priority 3
2014/10/09 03:52:32.741689 [28543]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:33.373422 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:33.373519 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:34.085988 [28543]: server/ctdb_recover.c:612 Recovery mode
set to NORMAL
2014/10/09 03:52:34.086056 [28543]: Thawing priority 1
2014/10/09 03:52:34.086075 [28543]: Release freeze handler for prio 1
2014/10/09 03:52:34.086157 [28543]: Thawing priority 2
2014/10/09 03:52:34.086180 [28543]: Release freeze handler for prio 2
2014/10/09 03:52:34.086228 [28543]: Thawing priority 3
2014/10/09 03:52:34.086248 [28543]: Release freeze handler for prio 3
2014/10/09 03:52:34.087340 [set_recmode:29373]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:34.087473 [set_recmode:29373]: ERROR: recovery lock file
/amtb/ctdb.lock not locked when recovering!
2014/10/09 03:52:34.094494 [28543]: Freeze priority 1
2014/10/09 03:52:34.097055 [28543]: Freeze priority 2
2014/10/09 03:52:34.099149 [28543]: Freeze priority 3
2014/10/09 03:52:34.101568 [28543]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:34.373702 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:34.373777 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:35.422255 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:35.422325 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:35.426012 [28543]: server/ctdb_recover.c:612 Recovery mode
set to NORMAL
2014/10/09 03:52:35.426048 [28543]: Thawing priority 1
2014/10/09 03:52:35.426064 [28543]: Release freeze handler for prio 1
2014/10/09 03:52:35.426146 [28543]: Thawing priority 2
2014/10/09 03:52:35.426168 [28543]: Release freeze handler for prio 2
2014/10/09 03:52:35.426222 [28543]: Thawing priority 3
2014/10/09 03:52:35.426242 [28543]: Release freeze handler for prio 3
2014/10/09 03:52:35.428838 [set_recmode:29436]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:35.428960 [set_recmode:29436]: ERROR: recovery lock file
/amtb/ctdb.lock not locked when recovering!
2014/10/09 03:52:35.436366 [28543]: Freeze priority 1
2014/10/09 03:52:35.439016 [28543]: Freeze priority 2
2014/10/09 03:52:35.441221 [28543]: Freeze priority 3
2014/10/09 03:52:35.443540 [28543]: server/ctdb_recover.c:988 startrecovery
eventscript has been invoked
2014/10/09 03:52:36.422527 [28543]: CTDB_WAIT_UNTIL_RECOVERED
2014/10/09 03:52:36.422629 [28543]: server/ctdb_monitor.c:293 in recovery.
Wait one more second
2014/10/09 03:52:36.763788 [28543]: server/ctdb_recover.c:612 Recovery mode
set to NORMAL
2014/10/09 03:52:36.763854 [28543]: Thawing priority 1
2014/10/09 03:52:36.763872 [28543]: Release freeze handler for prio 1
2014/10/09 03:52:36.763951 [28543]: Thawing priority 2
2014/10/09 03:52:36.763973 [28543]: Release freeze handler for prio 2
2014/10/09 03:52:36.764024 [28543]: Thawing priority 3
2014/10/09 03:52:36.764044 [28543]: Release freeze handler for prio 3
2014/10/09 03:52:36.766583 [set_recmode:29499]: ctdb_recovery_lock: Got
recovery lock on '/amtb/ctdb.lock'
2014/10/09 03:52:36.766754 [set_recmode:29499]: ERROR: recovery lock file
/amtb/ctdb.lock not locked when recovering!
2014/10/09 03:52:36.775245 [28543]: Freeze priority 1

On Thu, Oct 9, 2014 at 12:50 AM, Min Wai Chan <dcmwai at gmail.com> wrote:

> Hi Amitay
>
> Somehow, I think the 2 node are conflicting each other.
>
> I'm not too sure what happen
>
> If I start node 1 only
>
> the ctdb status will be
>
> Number of nodes:2
> pnn:0 192.168.20.30    OK (THIS NODE)
> pnn:1 192.168.20.34    DISCONNECTED|DISABLED|UNHEALTHY|INACTIVE
> Generation:514739632
> Size:1
> hash:0 lmaster:0
> Recovery mode:NORMAL (0)
> Recovery master:0
>
> And Later
>
> Number of nodes:2
> pnn:0 192.168.20.30    UNHEALTHY (THIS NODE)
> pnn:1 192.168.20.34    DISCONNECTED|DISABLED|UNHEALTHY|INACTIVE
> Generation:514739632
> Size:1
> hash:0 lmaster:0
> Recovery mode:NORMAL (0)
> Recovery master:0
>
> And If I start Node 2 than
>
> ctdb status
> Number of nodes:2
> pnn:0 192.168.20.30    UNHEALTHY (THIS NODE)
> pnn:1 192.168.20.34    DISABLED|UNHEALTHY
> Generation:1061098784
> Size:2
> hash:0 lmaster:0
> hash:1 lmaster:1
> Recovery mode:NORMAL (0)
> Recovery master:0
>
> If I run ctdb enable on node 2
>
> Number of nodes:2
> pnn:0 192.168.20.30    UNHEALTHY (THIS NODE)
> pnn:1 192.168.20.34    UNHEALTHY
> Generation:686675379
> Size:2
> hash:0 lmaster:0
> hash:1 lmaster:1
> Recovery mode:RECOVERY (1)
> Recovery master:0
>
> What next???
>
> I don't think it is working...
>
> On Wed, Oct 8, 2014 at 9:02 PM, Min Wai Chan <dcmwai at gmail.com> wrote:
>
>> Hi there,
>>
>> This is some new update
>>
>> It is strange that both node 1 and node 2 have different result...
>>
>> Node 1 result
>> 2014/10/08 21:01:07.607091 [31382]: Freeze priority 1
>> 2014/10/08 21:01:07.609728 [31382]: Freeze priority 2
>> 2014/10/08 21:01:07.611782 [31382]: Freeze priority 3
>> 2014/10/08 21:01:07.616061 [31382]: server/ctdb_recover.c:612 Recovery
>> mode set to ACTIVE
>> 2014/10/08 21:01:07.617335 [31382]: server/ctdb_recover.c:988
>> startrecovery eventscript has been invoked
>> 2014/10/08 21:01:07.884841 [recoverd:31530]: server/ctdb_recoverd.c:1863
>> Recovery - updated flags
>> 2014/10/08 21:01:07.885608 [recoverd:31530]: server/ctdb_recoverd.c:1907
>> started transactions on all nodes
>> 2014/10/08 21:01:07.885648 [recoverd:31530]: server/ctdb_recoverd.c:1920
>> Recovery - starting database commits
>> 2014/10/08 21:01:07.886319 [recoverd:31530]: server/ctdb_recoverd.c:1932
>> Recovery - committed databases
>> 2014/10/08 21:01:07.887420 [recoverd:31530]: server/ctdb_recoverd.c:1982
>> Recovery - updated vnnmap
>> 2014/10/08 21:01:07.888047 [recoverd:31530]: server/ctdb_recoverd.c:1991
>> Recovery - updated recmaster
>> 2014/10/08 21:01:07.888113 [31382]: server/ctdb_recover.c:612 Recovery
>> mode set to NORMAL
>> 2014/10/08 21:01:07.888142 [31382]: Thawing priority 1
>> 2014/10/08 21:01:07.888158 [31382]: Release freeze handler for prio 1
>> 2014/10/08 21:01:07.888205 [31382]: Thawing priority 2
>> 2014/10/08 21:01:07.888224 [31382]: Release freeze handler for prio 2
>> 2014/10/08 21:01:07.888261 [31382]: Thawing priority 3
>> 2014/10/08 21:01:07.888278 [31382]: Release freeze handler for prio 3
>> 2014/10/08 21:01:07.891120 [recoverd:31530]: ctdb_control error: 'managed
>> to lock reclock file from inside daemon'
>> 2014/10/08 21:01:07.891175 [recoverd:31530]: ctdb_control error: 'managed
>> to lock reclock file from inside daemon'
>> 2014/10/08 21:01:07.891200 [recoverd:31530]: Async operation failed with
>> ret=-1 res=-1 opcode=16
>> 2014/10/08 21:01:07.891218 [recoverd:31530]: Async wait failed -
>> fail_count=1
>> 2014/10/08 21:01:07.891240 [recoverd:31530]: server/ctdb_recoverd.c:412
>> Unable to set recovery mode. Recovery failed.
>> 2014/10/08 21:01:07.891256 [recoverd:31530]: server/ctdb_recoverd.c:1996
>> Unable to set recovery mode to normal on cluster
>> 2014/10/08 21:01:08.605210 [recoverd:31530]: server/ctdb_recoverd.c:1765
>> Starting do_recovery
>> 2014/10/08 21:01:08.605308 [recoverd:31530]: Taking out recovery lock
>> from recovery daemon
>> 2014/10/08 21:01:08.605329 [recoverd:31530]: Take the recovery lock
>> 2014/10/08 21:01:08.605395 [recoverd:31530]: Recovery lock taken
>> successfully
>> 2014/10/08 21:01:08.605418 [recoverd:31530]: ctdb_recovery_lock: Got
>> recovery lock on '/amtb/.ctdb.lock'
>> 2014/10/08 21:01:08.605497 [recoverd:31530]: Recovery lock taken
>> successfully by recovery daemon
>> 2014/10/08 21:01:08.605520 [recoverd:31530]: server/ctdb_recoverd.c:1790
>> Recovery initiated due to problem with node 0
>> 2014/10/08 21:01:08.606337 [recoverd:31530]: server/ctdb_recoverd.c:1815
>> Recovery - created remote databases
>> 2014/10/08 21:01:08.606383 [recoverd:31530]: server/ctdb_recoverd.c:1822
>> Recovery - updated db priority for all databases
>> 2014/10/08 21:01:08.606962 [31382]: Freeze priority 1
>> 2014/10/08 21:01:08.609229 [31382]: Freeze priority 2
>> 2014/10/08 21:01:08.611044 [31382]: Freeze priority 3
>> 2014/10/08 21:01:08.612904 [31382]: server/ctdb_recover.c:612 Recovery
>> mode set to ACTIVE
>> 2014/10/08 21:01:08.613437 [31382]: server/ctdb_recover.c:988
>> startrecovery eventscript has been invoked
>> 2014/10/08 21:01:08.874959 [recoverd:31530]: server/ctdb_recoverd.c:1863
>> Recovery - updated flags
>> 2014/10/08 21:01:08.876816 [recoverd:31530]: server/ctdb_recoverd.c:1907
>> started transactions on all nodes
>> 2014/10/08 21:01:08.876868 [recoverd:31530]: server/ctdb_recoverd.c:1920
>> Recovery - starting database commits
>> 2014/10/08 21:01:08.878686 [recoverd:31530]: server/ctdb_recoverd.c:1932
>> Recovery - committed databases
>> 2014/10/08 21:01:08.880264 [recoverd:31530]: server/ctdb_recoverd.c:1982
>> Recovery - updated vnnmap
>> 2014/10/08 21:01:08.880932 [recoverd:31530]: server/ctdb_recoverd.c:1991
>> Recovery - updated recmaster
>> 2014/10/08 21:01:08.881007 [31382]: server/ctdb_recover.c:612 Recovery
>> mode set to NORMAL
>> 2014/10/08 21:01:08.881056 [31382]: Thawing priority 1
>> 2014/10/08 21:01:08.881081 [31382]: Release freeze handler for prio 1
>> 2014/10/08 21:01:08.881118 [31382]: Thawing priority 2
>> 2014/10/08 21:01:08.881140 [31382]: Release freeze handler for prio 2
>> 2014/10/08 21:01:08.881195 [31382]: Thawing priority 3
>> 2014/10/08 21:01:08.881214 [31382]: Release freeze handler for prio 3
>> 2014/10/08 21:01:08.884055 [recoverd:31530]: ctdb_control error: 'managed
>> to lock reclock file from inside daemon'
>> 2014/10/08 21:01:08.884107 [recoverd:31530]: ctdb_control error: 'managed
>> to lock reclock file from inside daemon'
>> 2014/10/08 21:01:08.884162 [recoverd:31530]: Async operation failed with
>> ret=-1 res=-1 opcode=16
>> 2014/10/08 21:01:08.884199 [recoverd:31530]: Async wait failed -
>> fail_count=1
>> 2014/10/08 21:01:08.884217 [recoverd:31530]: server/ctdb_recoverd.c:412
>> Unable to set recovery mode. Recovery failed.
>> 2014/10/08 21:01:08.884244 [recoverd:31530]: server/ctdb_recoverd.c:1996
>> Unable to set recovery mode to normal on cluster
>> 2014/10/08 21:01:09.606439 [recoverd:31530]: server/ctdb_recoverd.c:1765
>> Starting do_recovery
>> 2014/10/08 21:01:09.606548 [recoverd:31530]: Taking out recovery lock
>> from recovery daemon
>> 2014/10/08 21:01:09.606572 [recoverd:31530]: Take the recovery lock
>> 2014/10/08 21:01:09.606641 [recoverd:31530]: Recovery lock taken
>> successfully
>> 2014/10/08 21:01:09.606666 [recoverd:31530]: ctdb_recovery_lock: Got
>> recovery lock on '/amtb/.ctdb.lock'
>> 2014/10/08 21:01:09.606745 [recoverd:31530]: Recovery lock taken
>> successfully by recovery daemon
>> 2014/10/08 21:01:09.606769 [recoverd:31530]: server/ctdb_recoverd.c:1790
>> Recovery initiated due to problem with node 0
>> 2014/10/08 21:01:09.607877 [recoverd:31530]: server/ctdb_recoverd.c:1815
>> Recovery - created remote databases
>> 2014/10/08 21:01:09.607919 [recoverd:31530]: server/ctdb_recoverd.c:1822
>> Recovery - updated db priority for all databases
>> 2014/10/08 21:01:09.608450 [31382]: Freeze priority 1
>> 2014/10/08 21:01:09.610851 [31382]: Freeze priority 2
>> 2014/10/08 21:01:09.613090 [31382]: Freeze priority 3
>> 2014/10/08 21:01:09.615232 [31382]: server/ctdb_recover.c:612 Recovery
>> mode set to ACTIVE
>> 2014/10/08 21:01:09.615819 [31382]: server/ctdb_recover.c:988
>> startrecovery eventscript has been invoked
>>
>>
>>
>>
>>
>> Node 2 result
>>
>>
>> 2014/10/08 21:00:26.830322 [set_recmode:15680]: ctdb_recovery_lock: Got
>> recovery lock on '/amtb/.ctdb.lock'
>> 2014/10/08 21:00:26.830515 [set_recmode:15680]: ERROR: recovery lock file
>> /amtb/.ctdb.lock not locked when recovering!
>> 2014/10/08 21:00:26.982110 [23001]: server/ctdb_monitor.c:293 in
>> recovery. Wait one more second
>> 2014/10/08 21:00:27.559316 [23001]: Freeze priority 1
>> 2014/10/08 21:00:27.561653 [23001]: Freeze priority 2
>> 2014/10/08 21:00:27.563598 [23001]: Freeze priority 3
>> 2014/10/08 21:00:27.566592 [23001]: server/ctdb_recover.c:988
>> startrecovery eventscript has been invoked
>> 2014/10/08 21:00:27.829809 [23001]: server/ctdb_recover.c:612 Recovery
>> mode set to NORMAL
>> 2014/10/08 21:00:27.829915 [23001]: Thawing priority 1
>> 2014/10/08 21:00:27.829936 [23001]: Release freeze handler for prio 1
>> 2014/10/08 21:00:27.829984 [23001]: Thawing priority 2
>> 2014/10/08 21:00:27.830002 [23001]: Release freeze handler for prio 2
>> 2014/10/08 21:00:27.830044 [23001]: Thawing priority 3
>> 2014/10/08 21:00:27.830061 [23001]: Release freeze handler for prio 3
>> 2014/10/08 21:00:27.831091 [set_recmode:15747]: ctdb_recovery_lock: Got
>> recovery lock on '/amtb/.ctdb.lock'
>> 2014/10/08 21:00:27.831276 [set_recmode:15747]: ERROR: recovery lock file
>> /amtb/.ctdb.lock not locked when recovering!
>> 2014/10/08 21:00:27.982956 [23001]: server/ctdb_monitor.c:293 in
>> recovery. Wait one more second
>> 2014/10/08 21:00:28.560605 [23001]: Freeze priority 1
>> 2014/10/08 21:00:28.562909 [23001]: Freeze priority 2
>> 2014/10/08 21:00:28.564781 [23001]: Freeze priority 3
>> 2014/10/08 21:00:28.567568 [23001]: server/ctdb_recover.c:988
>> startrecovery eventscript has been invoked
>> 2014/10/08 21:00:28.830091 [23001]: server/ctdb_recover.c:612 Recovery
>> mode set to NORMAL
>> 2014/10/08 21:00:28.830161 [23001]: Thawing priority 1
>> 2014/10/08 21:00:28.830181 [23001]: Release freeze handler for prio 1
>> 2014/10/08 21:00:28.830220 [23001]: Thawing priority 2
>> 2014/10/08 21:00:28.830237 [23001]: Release freeze handler for prio 2
>> 2014/10/08 21:00:28.830279 [23001]: Thawing priority 3
>> 2014/10/08 21:00:28.830296 [23001]: Release freeze handler for prio 3
>> 2014/10/08 21:00:28.831331 [set_recmode:15813]: ctdb_recovery_lock: Got
>> recovery lock on '/amtb/.ctdb.lock'
>> 2014/10/08 21:00:28.831515 [set_recmode:15813]: ERROR: recovery lock file
>> /amtb/.ctdb.lock not locked when recovering!
>> 2014/10/08 21:00:28.983083 [23001]: server/ctdb_monitor.c:293 in
>> recovery. Wait one more second
>> 2014/10/08 21:00:29.561454 [23001]: Freeze priority 1
>> 2014/10/08 21:00:29.563879 [23001]: Freeze priority 2
>> 2014/10/08 21:00:29.566278 [23001]: Freeze priority 3
>> 2014/10/08 21:00:29.569087 [23001]: server/ctdb_recover.c:988
>> startrecovery eventscript has been invoked
>>
>>
>>
>>
>> On Wed, Oct 8, 2014 at 7:27 PM, Min Wai Chan <dcmwai at gmail.com> wrote:
>>
>>> Dear Amitay,
>>>
>>> CTDB version 2.5.2
>>>
>>> Yes OCFS2 and DRBD are both working in good order.
>>>
>>> 2014/10/08 13:01:13.511256 [31697]: CTDB starting on node
>>> 2014/10/08 13:01:13.513377 [31700]: Starting CTDBD (Version 2.5.2) as
>>> PID: 31700
>>> 2014/10/08 13:01:13.513531 [31700]: Set scheduler to SCHED_FIFO
>>> 2014/10/08 13:01:13.513686 [31700]: server/ctdb_logging.c:600 Created
>>> PIPE FD:9 for logging
>>> 2014/10/08 13:01:13.513742 [31700]: Added timed event
>>> "ctdb_statistics_update": 0x6993e0
>>> 2014/10/08 13:01:13.513763 [31700]: Set runstate to INIT (1)
>>> 2014/10/08 13:01:13.513817 [31700]: server/eventscript.c:761 Starting
>>> eventscript init
>>> 2014/10/08 13:01:13.513965 [31700]: Event script
>>> /etc/ctdb/events.d/20.multipathd is not executable. Ignoring this event
>>> script
>>> 2014/10/08 13:01:13.513991 [31700]: Event script
>>> /etc/ctdb/events.d/31.clamd is not executable. Ignoring this event script
>>> 2014/10/08 13:01:13.514010 [31700]: Event script
>>> /etc/ctdb/events.d/40.fs_use is not executable. Ignoring this event script
>>> 2014/10/08 13:01:13.514053 [31700]: Event script
>>> /etc/ctdb/events.d/99.timeout is not executable. Ignoring this event script
>>> 2014/10/08 13:01:13.514319 [31700]: Added timed event
>>> "ctdb_event_script_timeout": 0x69cdf0
>>> 2014/10/08 13:01:13.552309 [31700]: SIGCHLD from 31702 process:31702
>>> 2014/10/08 13:01:13.569618 [31700]: SIGCHLD from 31719 process:31719
>>> 2014/10/08 13:01:13.588531 [31700]: SIGCHLD from 31724 process:31724
>>> 2014/10/08 13:01:13.603821 [31700]: SIGCHLD from 31728 process:31728
>>> 2014/10/08 13:01:13.619178 [31700]: SIGCHLD from 31732 process:31732
>>> 2014/10/08 13:01:13.634478 [31700]: SIGCHLD from 31735 process:31735
>>> 2014/10/08 13:01:13.635561 [31700]: SIGCHLD from 31739 process:31739
>>> 2014/10/08 13:01:13.636677 [31700]: SIGCHLD from 31741 process:31741
>>> 2014/10/08 13:01:13.637742 [31700]: SIGCHLD from 31743 process:31743
>>> 2014/10/08 13:01:13.653892 [31700]: SIGCHLD from 31745 process:31745
>>> 2014/10/08 13:01:13.670283 [31700]: SIGCHLD from 31749 process:31749
>>> 2014/10/08 13:01:13.688242 [31700]: SIGCHLD from 31752 process:31752
>>> 2014/10/08 13:01:13.706890 [31700]: SIGCHLD from 31756 process:31756
>>> 2014/10/08 13:01:13.723482 [31700]: SIGCHLD from 31760 process:31760
>>> 2014/10/08 13:01:13.741600 [31700]: SIGCHLD from 31763 process:31763
>>> 2014/10/08 13:01:13.758698 [31700]: SIGCHLD from 31767 process:31767
>>> 2014/10/08 13:01:13.774786 [31700]: SIGCHLD from 31772 process:31772
>>> 2014/10/08 13:01:13.789702 [31700]: SIGCHLD from 31775 process:31775
>>> 2014/10/08 13:01:13.790646 [31700]: server/eventscript.c:388 Eventscript
>>> init  finished with state 0
>>> 2014/10/08 13:01:13.790693 [31700]: Destroying timer event 0x69cdf0
>>> "ctdb_event_script_timeout"
>>> 2014/10/08 13:01:13.790839 [31700]: tcp/tcp_connect.c:382 Failed to
>>> bind() to socket. Cannot assign requested address(99)
>>> 2014/10/08 13:01:13.790883 [31700]: ctdb chose network address
>>> 192.168.20.34:4379 pnn 1
>>> 2014/10/08 13:01:13.790929 [31700]: This node is configured to start in
>>> DISABLED state
>>> 2014/10/08 13:01:13.830210 [31700]: We are already hosting public
>>> address '192.168.11.29'. setting PNN to ourself:1
>>> 2014/10/08 13:01:13.870200 [31700]: We are already hosting public
>>> address '192.168.11.33'. setting PNN to ourself:1
>>> 2014/10/08 13:01:13.920192 [31700]: We are already hosting public
>>> address '192.168.11.30'. setting PNN to ourself:1
>>> 2014/10/08 13:01:13.950184 [31700]: We are already hosting public
>>> address '192.168.11.34'. setting PNN to ourself:1
>>> 2014/10/08 13:01:13.980205 [31700]: We are already hosting public
>>> address '192.168.11.31'. setting PNN to ourself:1
>>> 2014/10/08 13:01:14.010194 [31700]: We are already hosting public
>>> address '192.168.11.35'. setting PNN to ourself:1
>>> 2014/10/08 13:01:14.010244 [31700]: Added timed event
>>> "ctdb_check_interfaces_event": 0x69e010
>>> 2014/10/08 13:01:14.010515 [31700]: Freeze priority 1
>>> 2014/10/08 13:01:14.010858 [31700]: Added timed event
>>> "ctdb_lock_timeout_handler": 0x69dc60
>>> 2014/10/08 13:01:14.010907 [31700]: SIGCHLD from 31778 process:31778
>>> 2014/10/08 13:01:14.011978 [31700]: Destroying timer event 0x69dc60
>>> "ctdb_lock_timeout_handler"
>>> 2014/10/08 13:01:14.012008 [31700]: Freeze priority 2
>>> 2014/10/08 13:01:14.012177 [31700]: Added timed event
>>> "ctdb_lock_timeout_handler": 0x69d8c0
>>> 2014/10/08 13:01:14.013043 [31700]: Destroying timer event 0x69d8c0
>>> "ctdb_lock_timeout_handler"
>>> 2014/10/08 13:01:14.013070 [31700]: Freeze priority 3
>>> 2014/10/08 13:01:14.013232 [31700]: Added timed event
>>> "ctdb_lock_timeout_handler": 0x69d8c0
>>> 2014/10/08 13:01:14.014098 [31700]: Destroying timer event 0x69d8c0
>>> "ctdb_lock_timeout_handler"
>>> 2014/10/08 13:01:14.014187 [31700]: server/ctdb_takeover.c:3240 Released
>>> 0 public IPs
>>> 2014/10/08 13:01:14.014209 [31700]: Added timed event
>>> "ctdb_tcp_node_connect": 0x69f380
>>> 2014/10/08 13:01:14.014224 [31700]: Set runstate to SETUP (2)
>>> 2014/10/08 13:01:14.014241 [31700]: server/eventscript.c:761 Starting
>>> eventscript setup
>>> 2014/10/08 13:01:14.014346 [31700]: Event script
>>> /etc/ctdb/events.d/20.multipathd is not executable. Ignoring this event
>>> script
>>> 2014/10/08 13:01:14.014369 [31700]: Event script
>>> /etc/ctdb/events.d/31.clamd is not executable. Ignoring this event script
>>> 2014/10/08 13:01:14.014386 [31700]: Event script
>>> /etc/ctdb/events.d/40.fs_use is not executable. Ignoring this event script
>>> 2014/10/08 13:01:14.014427 [31700]: Event script
>>> /etc/ctdb/events.d/99.timeout is not executable. Ignoring this event script
>>> 2014/10/08 13:01:14.014645 [31700]: Added timed event
>>> "ctdb_event_script_timeout": 0x69d740
>>> 2014/10/08 13:01:14.028764 [31700]: Running timer event 0x69f380
>>> "ctdb_tcp_node_connect"
>>> 2014/10/08 13:01:14.028809 [31700]: tcp/tcp_connect.c:170 Created TCP
>>> SOCKET FD:17
>>> 2014/10/08 13:01:14.028901 [31700]: Added timed event
>>> "ctdb_tcp_node_connect": 0x69fbf0
>>> 2014/10/08 13:01:14.028923 [31700]: Ending timer event 0x69f380
>>> "ctdb_tcp_node_connect"
>>> 2014/10/08 13:01:14.029330 [31700]: Destroying timer event 0x69fbf0
>>> "ctdb_tcp_node_connect"
>>> 2014/10/08 13:01:14.029377 [31700]: Added timed event
>>> "ctdb_tcp_node_connect": 0x69fb30
>>> 2014/10/08 13:01:14.037513 [31700]: SIGCHLD from 31783 process:31783
>>> 2014/10/08 13:01:14.053131 [31700]: SIGCHLD from 31791 process:31791
>>> 2014/10/08 13:01:14.070466 [31700]: SIGCHLD from 31794 process:31794
>>> 2014/10/08 13:01:14.085949 [31700]: SIGCHLD from 31797 process:31797
>>> 2014/10/08 13:01:14.101723 [31700]: SIGCHLD from 31800 process:31800
>>> 2014/10/08 13:01:14.117004 [31700]: SIGCHLD from 31803 process:31803
>>> 2014/10/08 13:01:14.118080 [31700]: SIGCHLD from 31806 process:31806
>>> 2014/10/08 13:01:14.119206 [31700]: SIGCHLD from 31808 process:31808
>>> 2014/10/08 13:01:14.120244 [31700]: SIGCHLD from 31810 process:31810
>>> 2014/10/08 13:01:14.136879 [31700]: SIGCHLD from 31812 process:31812
>>> 2014/10/08 13:01:14.154106 [31700]: SIGCHLD from 31815 process:31815
>>> 2014/10/08 13:01:14.172807 [31700]: SIGCHLD from 31818 process:31818
>>> 2014/10/08 13:01:14.191619 [31700]: SIGCHLD from 31822 process:31822
>>> 2014/10/08 13:01:14.207333 [31700]: SIGCHLD from 31826 process:31826
>>> 2014/10/08 13:01:14.225846 [31700]: SIGCHLD from 31829 process:31829
>>> 2014/10/08 13:01:14.243423 [31700]: SIGCHLD from 31833 process:31833
>>> 2014/10/08 13:01:14.259668 [31700]: SIGCHLD from 31837 process:31837
>>> 2014/10/08 13:01:14.274982 [31700]: SIGCHLD from 31840 process:31840
>>> 2014/10/08 13:01:14.275929 [31700]: server/eventscript.c:388 Eventscript
>>> setup  finished with state 0
>>> 2014/10/08 13:01:14.275981 [31700]: Added timed event
>>> "queue_next_trigger": 0x6a62a0
>>> 2014/10/08 13:01:14.276331 [31700]: Added timed event "ctdb_check_recd":
>>> 0x6a6190
>>> 2014/10/08 13:01:14.276383 [31700]: Added timed event
>>> "ctdb_check_for_dead_nodes": 0x6a63d0
>>> 2014/10/08 13:01:14.276399 [31700]: Keepalive monitoring has been started
>>> 2014/10/08 13:01:14.276413 [31700]: Added timed event
>>> "ctdb_update_tcp_tickles": 0x6a6570
>>> 2014/10/08 13:01:14.276429 [31700]: Added timed event
>>> "ctdb_recd_ping_timeout": 0x6a6710
>>> 2014/10/08 13:01:14.276443 [31700]: Added timed event "ctdb_time_tick":
>>> 0x6a68b0
>>> 2014/10/08 13:01:14.276457 [31700]: Set runstate to FIRST_RECOVERY (3)
>>> 2014/10/08 13:01:14.276471 [31700]: Added timed event
>>> "ctdb_wait_until_recovered": 0x6a6a60
>>> 2014/10/08 13:01:14.276513 [31700]: Destroying timer event 0x69d740
>>> "ctdb_event_script_timeout"
>>> 2014/10/08 13:01:14.276548 [31700]: SIGCHLD from 31843 process:31843
>>> 2014/10/08 13:01:14.276569 [31700]: Running timer event 0x6a62a0
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.276623 [31700]: server/ctdb_server.c:292 ctdb
>>> request 2147483448 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 13:01:14.276482 [31845]: EPOLL_CTL_DEL ignoring ENOENT for
>>> fd[7]
>>> 2014/10/08 13:01:14.276651 [31700]: Ending timer event 0x6a62a0
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.276981 [recoverd:31845]: server/ctdb_recoverd.c:4281
>>> Created PIPE FD:17 to recovery daemon
>>> 2014/10/08 13:01:14.277003 [31700]: server/ctdb_daemon.c:927 Created
>>> SOCKET FD:16 to connected child
>>> 2014/10/08 13:01:14.277038 [recoverd:31845]: monitor_cluster starting
>>> 2014/10/08 13:01:14.277139 [31700]: server/ctdb_daemon.c:889 client
>>> request 2147483449 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 13:01:14.277173 [31700]: Added timed event
>>> "queue_next_trigger": 0x69f860
>>> 2014/10/08 13:01:14.277190 [31700]: Added timed event
>>> "ctdb_control_timeout": 0x6a4cc0
>>> 2014/10/08 13:01:14.277206 [31700]: Running timer event 0x69f860
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.277220 [31700]: server/ctdb_server.c:292 ctdb
>>> request 2147483450 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 13:01:14.277317 [31700]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=17870283321406128128
>>> 2014/10/08 13:01:14.277342 [31700]: Added timed event
>>> "queue_next_trigger": 0x6a0420
>>> 2014/10/08 13:01:14.277357 [31700]: Ending timer event 0x69f860
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.277372 [31700]: Running timer event 0x6a0420
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.277386 [31700]: server/ctdb_server.c:292 ctdb
>>> request 2147483450 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 13:01:14.277431 [31700]: Destroying timer event 0x6a4cc0
>>> "ctdb_control_timeout"
>>> 2014/10/08 13:01:14.277449 [31700]: Ending timer event 0x6a0420
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.277626 [31700]: server/ctdb_daemon.c:889 client
>>> request 2147483450 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 13:01:14.277654 [31700]: Added timed event
>>> "queue_next_trigger": 0x6a4bf0
>>> 2014/10/08 13:01:14.277670 [31700]: Added timed event
>>> "ctdb_control_timeout": 0x6a02e0
>>> 2014/10/08 13:01:14.277685 [31700]: Running timer event 0x6a4bf0
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.277699 [31700]: server/ctdb_server.c:292 ctdb
>>> request 2147483451 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 13:01:14.277723 [31700]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=17870564796382838784
>>> 2014/10/08 13:01:14.277739 [31700]: Added timed event
>>> "queue_next_trigger": 0x6a0650
>>> 2014/10/08 13:01:14.277754 [31700]: Ending timer event 0x6a4bf0
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.277769 [31700]: Running timer event 0x6a0650
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.277783 [31700]: server/ctdb_server.c:292 ctdb
>>> request 2147483451 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 13:01:14.277818 [31700]: Destroying timer event 0x6a02e0
>>> "ctdb_control_timeout"
>>> 2014/10/08 13:01:14.277836 [31700]: Ending timer event 0x6a0650
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.277892 [31700]: server/ctdb_daemon.c:889 client
>>> request 2147483451 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 13:01:14.277917 [31700]: Added timed event
>>> "queue_next_trigger": 0x6a0630
>>> 2014/10/08 13:01:14.277933 [31700]: Added timed event
>>> "ctdb_control_timeout": 0x6a07d0
>>> 2014/10/08 13:01:14.277948 [31700]: Running timer event 0x6a0630
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.277962 [31700]: server/ctdb_server.c:292 ctdb
>>> request 2147483452 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 13:01:14.277981 [31700]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=17870846271359549440
>>> 2014/10/08 13:01:14.277997 [31700]: Added timed event
>>> "queue_next_trigger": 0x6a0b60
>>> 2014/10/08 13:01:14.278012 [31700]: Ending timer event 0x6a0630
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.278026 [31700]: Running timer event 0x6a0b60
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.278040 [31700]: server/ctdb_server.c:292 ctdb
>>> request 2147483452 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 13:01:14.278073 [31700]: Destroying timer event 0x6a07d0
>>> "ctdb_control_timeout"
>>> 2014/10/08 13:01:14.278090 [31700]: Ending timer event 0x6a0b60
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.278128 [31700]: server/ctdb_daemon.c:889 client
>>> request 2147483452 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 13:01:14.278149 [31700]: Added timed event
>>> "queue_next_trigger": 0x6a0b40
>>> 2014/10/08 13:01:14.278164 [31700]: Added timed event
>>> "ctdb_control_timeout": 0x6a0ce0
>>> 2014/10/08 13:01:14.278179 [31700]: Running timer event 0x6a0b40
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.278192 [31700]: server/ctdb_server.c:292 ctdb
>>> request 2147483453 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 13:01:14.278210 [31700]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=17365880163140632576
>>> 2014/10/08 13:01:14.278226 [31700]: Added timed event
>>> "queue_next_trigger": 0x6a1070
>>> 2014/10/08 13:01:14.278250 [31700]: Ending timer event 0x6a0b40
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.278267 [31700]: Running timer event 0x6a1070
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.278281 [31700]: server/ctdb_server.c:292 ctdb
>>> request 2147483453 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 13:01:14.278313 [31700]: Destroying timer event 0x6a0ce0
>>> "ctdb_control_timeout"
>>> 2014/10/08 13:01:14.278330 [31700]: Ending timer event 0x6a1070
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.278386 [31700]: server/ctdb_daemon.c:889 client
>>> request 2147483453 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 13:01:14.278410 [31700]: Added timed event
>>> "queue_next_trigger": 0x6a1050
>>> 2014/10/08 13:01:14.278425 [31700]: Added timed event
>>> "ctdb_control_timeout": 0x6a11f0
>>> 2014/10/08 13:01:14.278440 [31700]: Running timer event 0x6a1050
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.278453 [31700]: server/ctdb_server.c:292 ctdb
>>> request 2147483454 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 13:01:14.278473 [31700]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=17582052945254416384
>>> 2014/10/08 13:01:14.278488 [31700]: Added timed event
>>> "queue_next_trigger": 0x6b5d70
>>> 2014/10/08 13:01:14.278502 [31700]: Ending timer event 0x6a1050
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.278516 [31700]: Running timer event 0x6b5d70
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.278530 [31700]: server/ctdb_server.c:292 ctdb
>>> request 2147483454 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 13:01:14.278571 [31700]: Destroying timer event 0x6a11f0
>>> "ctdb_control_timeout"
>>> 2014/10/08 13:01:14.278589 [31700]: Ending timer event 0x6b5d70
>>> "queue_next_trigger"
>>> 2014/10/08 13:01:14.278661 [31700]: server/ctdb_daemon.c:889 client
>>> request 2147483454 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 13:01:14.278688 [31700]: Added timed event
>>> "queue_next_trigger": 0x6b5d50
>>> 2014/10/08 13:01:14.278703 [31700]: Added timed event
>>> "ctdb_control_timeout": 0x6b5ef0
>>> 2014/10/08 13:01:14.278718 [31700]: Running timer event 0x6b5d50
>>> "queue_next_trigger"
>>>
>>>
>>> It seem to be repetitive now...
>>>
>>>
>>>
>>>
>>>
>>>
>>> TESTING2
>>> 2014/10/08 19:19:18.744454 [25308]: CTDB starting on node
>>> 2014/10/08 19:19:18.746866 [25312]: Starting CTDBD (Version 2.5.2) as
>>> PID: 25312
>>> 2014/10/08 19:19:18.747016 [25312]: Set scheduler to SCHED_FIFO
>>> 2014/10/08 19:19:18.747169 [25312]: server/ctdb_logging.c:600 Created
>>> PIPE FD:9 for logging
>>> 2014/10/08 19:19:18.747223 [25312]: Added timed event
>>> "ctdb_statistics_update": 0x6994c0
>>> 2014/10/08 19:19:18.747242 [25312]: Set runstate to INIT (1)
>>> 2014/10/08 19:19:18.747274 [25312]: server/eventscript.c:761 Starting
>>> eventscript init
>>> 2014/10/08 19:19:18.747417 [25312]: Event script
>>> /etc/ctdb/events.d/20.multipathd is not executable. Ignoring this event
>>> script
>>> 2014/10/08 19:19:18.747445 [25312]: Event script
>>> /etc/ctdb/events.d/31.clamd is not executable. Ignoring this event script
>>> 2014/10/08 19:19:18.747462 [25312]: Event script
>>> /etc/ctdb/events.d/40.fs_use is not executable. Ignoring this event script
>>> 2014/10/08 19:19:18.747508 [25312]: Event script
>>> /etc/ctdb/events.d/99.timeout is not executable. Ignoring this event script
>>> 2014/10/08 19:19:18.748236 [25312]: Added timed event
>>> "ctdb_event_script_timeout": 0x69ced0
>>> 2014/10/08 19:19:18.790175 [25312]: SIGCHLD from 25313 process:25313
>>> 2014/10/08 19:19:18.807276 [25312]: SIGCHLD from 25330 process:25330
>>> 2014/10/08 19:19:18.826138 [25312]: SIGCHLD from 25334 process:25334
>>> 2014/10/08 19:19:18.841407 [25312]: SIGCHLD from 25338 process:25338
>>> 2014/10/08 19:19:18.856628 [25312]: SIGCHLD from 25341 process:25341
>>> 2014/10/08 19:19:18.871994 [25312]: SIGCHLD from 25344 process:25344
>>> 2014/10/08 19:19:18.873033 [25312]: SIGCHLD from 25347 process:25347
>>> 2014/10/08 19:19:18.874079 [25312]: SIGCHLD from 25349 process:25349
>>> 2014/10/08 19:19:18.875133 [25312]: SIGCHLD from 25351 process:25351
>>> 2014/10/08 19:19:18.891210 [25312]: SIGCHLD from 25353 process:25353
>>> 2014/10/08 19:19:18.907600 [25312]: SIGCHLD from 25356 process:25356
>>> 2014/10/08 19:19:18.926338 [25312]: SIGCHLD from 25360 process:25360
>>> 2014/10/08 19:19:18.945064 [25312]: SIGCHLD from 25365 process:25365
>>> 2014/10/08 19:19:18.960714 [25312]: SIGCHLD from 25369 process:25369
>>> 2014/10/08 19:19:18.978912 [25312]: SIGCHLD from 25372 process:25372
>>> 2014/10/08 19:19:18.996178 [25312]: SIGCHLD from 25376 process:25376
>>> 2014/10/08 19:19:19.012236 [25312]: SIGCHLD from 25380 process:25380
>>> 2014/10/08 19:19:19.027202 [25312]: SIGCHLD from 25384 process:25384
>>> 2014/10/08 19:19:19.028094 [25312]: server/eventscript.c:388 Eventscript
>>> init  finished with state 0
>>> 2014/10/08 19:19:19.028140 [25312]: Destroying timer event 0x69ced0
>>> "ctdb_event_script_timeout"
>>> 2014/10/08 19:19:19.028282 [25312]: tcp/tcp_connect.c:382 Failed to
>>> bind() to socket. Cannot assign requested address(99)
>>> 2014/10/08 19:19:19.028326 [25312]: ctdb chose network address
>>> 192.168.20.34:4379 pnn 1
>>> 2014/10/08 19:19:19.028372 [25312]: This node is configured to start in
>>> DISABLED state
>>> 2014/10/08 19:19:19.230199 [25312]: Added timed event
>>> "ctdb_check_interfaces_event": 0x69e0f0
>>> 2014/10/08 19:19:19.230492 [25312]: Freeze priority 1
>>> 2014/10/08 19:19:19.230717 [25312]: Added timed event
>>> "ctdb_lock_timeout_handler": 0x69dd40
>>> 2014/10/08 19:19:19.230765 [25312]: SIGCHLD from 25387 process:25387
>>> 2014/10/08 19:19:19.231665 [25312]: Destroying timer event 0x69dd40
>>> "ctdb_lock_timeout_handler"
>>> 2014/10/08 19:19:19.231714 [25312]: Freeze priority 2
>>> 2014/10/08 19:19:19.231893 [25312]: Added timed event
>>> "ctdb_lock_timeout_handler": 0x69d9a0
>>> 2014/10/08 19:19:19.232775 [25312]: Destroying timer event 0x69d9a0
>>> "ctdb_lock_timeout_handler"
>>> 2014/10/08 19:19:19.232806 [25312]: Freeze priority 3
>>> 2014/10/08 19:19:19.232980 [25312]: Added timed event
>>> "ctdb_lock_timeout_handler": 0x69d9a0
>>> 2014/10/08 19:19:19.233849 [25312]: Destroying timer event 0x69d9a0
>>> "ctdb_lock_timeout_handler"
>>> 2014/10/08 19:19:19.233930 [25312]: server/ctdb_takeover.c:248 public
>>> address '192.168.11.35' now unassigned (old iface '__none__' refs[0])
>>> 2014/10/08 19:19:19.233962 [25312]: server/ctdb_takeover.c:248 public
>>> address '192.168.11.31' now unassigned (old iface '__none__' refs[0])
>>> 2014/10/08 19:19:19.233988 [25312]: server/ctdb_takeover.c:248 public
>>> address '192.168.11.34' now unassigned (old iface '__none__' refs[0])
>>> 2014/10/08 19:19:19.234049 [25312]: server/ctdb_takeover.c:248 public
>>> address '192.168.11.30' now unassigned (old iface '__none__' refs[0])
>>> 2014/10/08 19:19:19.234077 [25312]: server/ctdb_takeover.c:248 public
>>> address '192.168.11.33' now unassigned (old iface '__none__' refs[0])
>>> 2014/10/08 19:19:19.234100 [25312]: server/ctdb_takeover.c:248 public
>>> address '192.168.11.29' now unassigned (old iface '__none__' refs[0])
>>> 2014/10/08 19:19:19.234117 [25312]: server/ctdb_takeover.c:3240 Released
>>> 0 public IPs
>>> 2014/10/08 19:19:19.234135 [25312]: Added timed event
>>> "ctdb_tcp_node_connect": 0x69f380
>>> 2014/10/08 19:19:19.234149 [25312]: Set runstate to SETUP (2)
>>> 2014/10/08 19:19:19.234167 [25312]: server/eventscript.c:761 Starting
>>> eventscript setup
>>> 2014/10/08 19:19:19.234277 [25312]: Event script
>>> /etc/ctdb/events.d/20.multipathd is not executable. Ignoring this event
>>> script
>>> 2014/10/08 19:19:19.234299 [25312]: Event script
>>> /etc/ctdb/events.d/31.clamd is not executable. Ignoring this event script
>>> 2014/10/08 19:19:19.234317 [25312]: Event script
>>> /etc/ctdb/events.d/40.fs_use is not executable. Ignoring this event script
>>> 2014/10/08 19:19:19.234361 [25312]: Event script
>>> /etc/ctdb/events.d/99.timeout is not executable. Ignoring this event script
>>> 2014/10/08 19:19:19.234527 [25312]: Added timed event
>>> "ctdb_event_script_timeout": 0x69d820
>>> 2014/10/08 19:19:19.248044 [25312]: Running timer event 0x69f380
>>> "ctdb_tcp_node_connect"
>>> 2014/10/08 19:19:19.248098 [25312]: tcp/tcp_connect.c:170 Created TCP
>>> SOCKET FD:17
>>> 2014/10/08 19:19:19.248205 [25312]: Added timed event
>>> "ctdb_tcp_node_connect": 0x69fd80
>>> 2014/10/08 19:19:19.248228 [25312]: Ending timer event 0x69f380
>>> "ctdb_tcp_node_connect"
>>> 2014/10/08 19:19:19.249847 [25312]: Destroying timer event 0x69fd80
>>> "ctdb_tcp_node_connect"
>>> 2014/10/08 19:19:19.249894 [25312]: Added timed event
>>> "ctdb_tcp_node_connect": 0x69fcc0
>>> 2014/10/08 19:19:19.256288 [25312]: SIGCHLD from 25392 process:25392
>>> 2014/10/08 19:19:19.271657 [25312]: SIGCHLD from 25400 process:25400
>>> 2014/10/08 19:19:19.288722 [25312]: SIGCHLD from 25403 process:25403
>>> 2014/10/08 19:19:19.303868 [25312]: SIGCHLD from 25406 process:25406
>>> 2014/10/08 19:19:19.319100 [25312]: SIGCHLD from 25409 process:25409
>>> 2014/10/08 19:19:19.334558 [25312]: SIGCHLD from 25412 process:25412
>>> 2014/10/08 19:19:19.335639 [25312]: SIGCHLD from 25416 process:25416
>>> 2014/10/08 19:19:19.336738 [25312]: SIGCHLD from 25418 process:25418
>>> 2014/10/08 19:19:19.337758 [25312]: SIGCHLD from 25420 process:25420
>>> 2014/10/08 19:19:19.354279 [25312]: SIGCHLD from 25422 process:25422
>>> 2014/10/08 19:19:19.371641 [25312]: SIGCHLD from 25425 process:25425
>>> 2014/10/08 19:19:19.389568 [25312]: SIGCHLD from 25428 process:25428
>>> 2014/10/08 19:19:19.408114 [25312]: SIGCHLD from 25433 process:25433
>>> 2014/10/08 19:19:19.423552 [25312]: SIGCHLD from 25437 process:25437
>>> 2014/10/08 19:19:19.441688 [25312]: SIGCHLD from 25441 process:25441
>>> 2014/10/08 19:19:19.458973 [25312]: SIGCHLD from 25445 process:25445
>>> 2014/10/08 19:19:19.475113 [25312]: SIGCHLD from 25449 process:25449
>>> 2014/10/08 19:19:19.490259 [25312]: SIGCHLD from 25452 process:25452
>>> 2014/10/08 19:19:19.491162 [25312]: server/eventscript.c:388 Eventscript
>>> setup  finished with state 0
>>> 2014/10/08 19:19:19.491241 [25312]: Added timed event
>>> "queue_next_trigger": 0x6a64a0
>>> 2014/10/08 19:19:19.491513 [25312]: Added timed event "ctdb_check_recd":
>>> 0x6a61c0
>>> 2014/10/08 19:19:19.491556 [25312]: Added timed event
>>> "ctdb_check_for_dead_nodes": 0x6a65d0
>>> 2014/10/08 19:19:19.491572 [25312]: Keepalive monitoring has been started
>>> 2014/10/08 19:19:19.491587 [25312]: Added timed event
>>> "ctdb_update_tcp_tickles": 0x6a6700
>>> 2014/10/08 19:19:19.491601 [25312]: Added timed event
>>> "ctdb_recd_ping_timeout": 0x6a68a0
>>> 2014/10/08 19:19:19.491615 [25312]: Added timed event "ctdb_time_tick":
>>> 0x6a6a40
>>> 2014/10/08 19:19:19.491629 [25312]: Set runstate to FIRST_RECOVERY (3)
>>> 2014/10/08 19:19:19.491644 [25312]: Added timed event
>>> "ctdb_wait_until_recovered": 0x6a6b70
>>> 2014/10/08 19:19:19.491621 [25458]: EPOLL_CTL_DEL ignoring ENOENT for
>>> fd[7]
>>> 2014/10/08 19:19:19.491705 [25312]: Destroying timer event 0x69d820
>>> "ctdb_event_script_timeout"
>>> 2014/10/08 19:19:19.491902 [25312]: SIGCHLD from 25456 process:25456
>>> 2014/10/08 19:19:19.491934 [25312]: Running timer event 0x6a64a0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.491950 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483448 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 19:19:19.491969 [25312]: Ending timer event 0x6a64a0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.492011 [25312]: server/ctdb_daemon.c:927 Created
>>> SOCKET FD:16 to connected child
>>> 2014/10/08 19:19:19.492031 [recoverd:25458]: server/ctdb_recoverd.c:4281
>>> Created PIPE FD:17 to recovery daemon
>>> 2014/10/08 19:19:19.492085 [recoverd:25458]: monitor_cluster starting
>>> 2014/10/08 19:19:19.492192 [25312]: server/ctdb_daemon.c:889 client
>>> request 2147483449 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 19:19:19.492227 [25312]: Added timed event
>>> "queue_next_trigger": 0x6a4ac0
>>> 2014/10/08 19:19:19.492242 [25312]: Added timed event
>>> "ctdb_control_timeout": 0x6a01e0
>>> 2014/10/08 19:19:19.492257 [25312]: Running timer event 0x6a4ac0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.492271 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483450 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 19:19:19.492381 [25312]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=17870283321406128128
>>> 2014/10/08 19:19:19.492408 [25312]: Added timed event
>>> "queue_next_trigger": 0x6a0470
>>> 2014/10/08 19:19:19.492423 [25312]: Ending timer event 0x6a4ac0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.492438 [25312]: Running timer event 0x6a0470
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.492452 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483450 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 19:19:19.492488 [25312]: Destroying timer event 0x6a01e0
>>> "ctdb_control_timeout"
>>> 2014/10/08 19:19:19.492506 [25312]: Ending timer event 0x6a0470
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.492644 [25312]: server/ctdb_daemon.c:889 client
>>> request 2147483450 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 19:19:19.492672 [25312]: Added timed event
>>> "queue_next_trigger": 0x6a16a0
>>> 2014/10/08 19:19:19.492700 [25312]: Added timed event
>>> "ctdb_control_timeout": 0x6a0180
>>> 2014/10/08 19:19:19.492719 [25312]: Running timer event 0x6a16a0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.492733 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483451 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 19:19:19.492756 [25312]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=17870564796382838784
>>> 2014/10/08 19:19:19.492773 [25312]: Added timed event
>>> "queue_next_trigger": 0x6a06d0
>>> 2014/10/08 19:19:19.492787 [25312]: Ending timer event 0x6a16a0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.492801 [25312]: Running timer event 0x6a06d0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.492815 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483451 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 19:19:19.492850 [25312]: Destroying timer event 0x6a0180
>>> "ctdb_control_timeout"
>>> 2014/10/08 19:19:19.492867 [25312]: Ending timer event 0x6a06d0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.492891 [25312]: server/ctdb_daemon.c:889 client
>>> request 2147483451 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 19:19:19.492910 [25312]: Added timed event
>>> "queue_next_trigger": 0x6a0760
>>> 2014/10/08 19:19:19.492923 [25312]: Added timed event
>>> "ctdb_control_timeout": 0x6a0900
>>> 2014/10/08 19:19:19.492937 [25312]: Running timer event 0x6a0760
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.492951 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483452 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 19:19:19.492969 [25312]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=17870846271359549440
>>> 2014/10/08 19:19:19.492984 [25312]: Added timed event
>>> "queue_next_trigger": 0x69f5f0
>>> 2014/10/08 19:19:19.492998 [25312]: Ending timer event 0x6a0760
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493012 [25312]: Running timer event 0x69f5f0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493025 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483452 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 19:19:19.493075 [25312]: Destroying timer event 0x6a0900
>>> "ctdb_control_timeout"
>>> 2014/10/08 19:19:19.493094 [25312]: Ending timer event 0x69f5f0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493170 [25312]: server/ctdb_daemon.c:889 client
>>> request 2147483452 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 19:19:19.493197 [25312]: Added timed event
>>> "queue_next_trigger": 0x69f5d0
>>> 2014/10/08 19:19:19.493211 [25312]: Added timed event
>>> "ctdb_control_timeout": 0x69f770
>>> 2014/10/08 19:19:19.493226 [25312]: Running timer event 0x69f5d0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493239 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483453 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 19:19:19.493258 [25312]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=17365880163140632576
>>> 2014/10/08 19:19:19.493273 [25312]: Added timed event
>>> "queue_next_trigger": 0x69fa70
>>> 2014/10/08 19:19:19.493287 [25312]: Ending timer event 0x69f5d0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493301 [25312]: Running timer event 0x69fa70
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493315 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483453 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 19:19:19.493346 [25312]: Destroying timer event 0x69f770
>>> "ctdb_control_timeout"
>>> 2014/10/08 19:19:19.493363 [25312]: Ending timer event 0x69fa70
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493385 [25312]: server/ctdb_daemon.c:889 client
>>> request 2147483453 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 19:19:19.493403 [25312]: Added timed event
>>> "queue_next_trigger": 0x69fa50
>>> 2014/10/08 19:19:19.493416 [25312]: Added timed event
>>> "ctdb_control_timeout": 0x69fbf0
>>> 2014/10/08 19:19:19.493431 [25312]: Running timer event 0x69fa50
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493445 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483454 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 19:19:19.493462 [25312]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=17582052945254416384
>>> 2014/10/08 19:19:19.493477 [25312]: Added timed event
>>> "queue_next_trigger": 0x6b5f80
>>> 2014/10/08 19:19:19.493491 [25312]: Ending timer event 0x69fa50
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493504 [25312]: Running timer event 0x6b5f80
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493518 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483454 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 19:19:19.493549 [25312]: Destroying timer event 0x69fbf0
>>> "ctdb_control_timeout"
>>> 2014/10/08 19:19:19.493565 [25312]: Ending timer event 0x6b5f80
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493586 [25312]: server/ctdb_daemon.c:889 client
>>> request 2147483454 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 19:19:19.493604 [25312]: Added timed event
>>> "queue_next_trigger": 0x6b5f60
>>> 2014/10/08 19:19:19.493618 [25312]: Added timed event
>>> "ctdb_control_timeout": 0x6b6100
>>> 2014/10/08 19:19:19.493632 [25312]: Running timer event 0x6b5f60
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493645 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483455 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 19:19:19.493663 [25312]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=17942340915444056064
>>> 2014/10/08 19:19:19.493678 [25312]: Added timed event
>>> "queue_next_trigger": 0x6b6490
>>> 2014/10/08 19:19:19.493692 [25312]: Ending timer event 0x6b5f60
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493705 [25312]: Running timer event 0x6b6490
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493719 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483455 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 19:19:19.493750 [25312]: Destroying timer event 0x6b6100
>>> "ctdb_control_timeout"
>>> 2014/10/08 19:19:19.493767 [25312]: Ending timer event 0x6b6490
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493837 [25312]: server/ctdb_daemon.c:889 client
>>> request 2147483455 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 19:19:19.493864 [25312]: Added timed event
>>> "queue_next_trigger": 0x6b6470
>>> 2014/10/08 19:19:19.493878 [25312]: Added timed event
>>> "ctdb_control_timeout": 0x6b6610
>>> 2014/10/08 19:19:19.493893 [25312]: Running timer event 0x6b6470
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493917 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483456 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 19:19:19.493936 [25312]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=17798225727368200192
>>> 2014/10/08 19:19:19.493951 [25312]: Added timed event
>>> "queue_next_trigger": 0x6b69a0
>>> 2014/10/08 19:19:19.493965 [25312]: Ending timer event 0x6b6470
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493979 [25312]: Running timer event 0x6b69a0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.493992 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483456 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 19:19:19.494023 [25312]: Destroying timer event 0x6b6610
>>> "ctdb_control_timeout"
>>> 2014/10/08 19:19:19.494040 [25312]: Ending timer event 0x6b69a0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.494062 [25312]: server/ctdb_daemon.c:889 client
>>> request 2147483456 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 19:19:19.494080 [25312]: Added timed event
>>> "queue_next_trigger": 0x6b6980
>>> 2014/10/08 19:19:19.494093 [25312]: Added timed event
>>> "ctdb_control_timeout": 0x6b6b20
>>> 2014/10/08 19:19:19.494107 [25312]: Running timer event 0x6b6980
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.494120 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483457 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 19:19:19.494137 [25312]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=18014398509481984000
>>> 2014/10/08 19:19:19.494152 [25312]: Added timed event
>>> "queue_next_trigger": 0x6b6eb0
>>> 2014/10/08 19:19:19.494166 [25312]: Ending timer event 0x6b6980
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.494179 [25312]: Running timer event 0x6b6eb0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.494192 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483457 of type 8 length 48 from node 1 to 1
>>> 2014/10/08 19:19:19.494230 [25312]: Destroying timer event 0x6b6b20
>>> "ctdb_control_timeout"
>>> 2014/10/08 19:19:19.494246 [25312]: Ending timer event 0x6b6eb0
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.494268 [25312]: server/ctdb_daemon.c:889 client
>>> request 2147483457 of type 7 length 64 from node 1 to 4026531841
>>> 2014/10/08 19:19:19.494286 [25312]: Added timed event
>>> "queue_next_trigger": 0x6b6e90
>>> 2014/10/08 19:19:19.494299 [25312]: Added timed event
>>> "ctdb_control_timeout": 0x6b7030
>>> 2014/10/08 19:19:19.494313 [25312]: Running timer event 0x6b6e90
>>> "queue_next_trigger"
>>> 2014/10/08 19:19:19.494327 [25312]: server/ctdb_server.c:292 ctdb
>>> request 2147483458 of type 7 length 64 from node 1 to 1
>>> 2014/10/08 19:19:19.494344 [25312]: server/ctdb_daemon.c:172 Registered
>>> message handler for srvid=18086456103519911936
>>> 2014/10/08 19:19:19.494359 [25312]: Added timed event
>>> "queue_next_trigger": 0x6b73c0
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Oct 8, 2014 at 11:52 AM, Amitay Isaacs <amitay at gmail.com> wrote:
>>>
>>>>
>>>> On Tue, Oct 7, 2014 at 11:27 PM, Min Wai Chan <dcmwai at gmail.com> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> I've some CTDB issue which I'm not sure where to start...
>>>>> I follow this guide by steve which is nice.
>>>>> The different on what  have is that I don't have drbd running...
>>>>>
>>>>
>>>> What version of CTDB are you running?
>>>>
>>>>
>>>>>
>>>>> Also I've 4 x GE which is all Connected to a switch with different ip
>>>>> but
>>>>> same subnet
>>>>> This is suppose to load balance the traffic as under samba dns, they
>>>>> all
>>>>> have the same name.
>>>>>
>>>>> I'm only planing 3 for ctdb
>>>>>
>>>>>
>>>>> My 2 VM have direct access to a shared vmdk which mount as raw ocfs2.
>>>>> So
>>>>> should not be disk issue.
>>>>> Below are some of my configuration and log
>>>>>
>>>>>
>>>> Is OCFS2 configured properly and working correctly?
>>>>
>>>>
>>>>> Please advise if you have know where I Should point to...
>>>>>
>>>>> */etc/conf.d/net*
>>>>> config_enp2s0="192.168.11.32/24"
>>>>> config_enp2s2="null"
>>>>> config_enp2s3="192.168.12.34"
>>>>> config_enp2s4="null"
>>>>>
>>>>> dns_domain="kl01.amtb-m.org.my"
>>>>> dns_servers="192.168.11.20 192.168.11.24"
>>>>> dns_search="kl01.amtb-m.org.my"
>>>>>
>>>>> routes_enp2s0="default via 192.168.11.10"
>>>>> routes_enp2s2="default via 192.168.11.10"
>>>>> routes_enp2s3="default via 192.168.11.10"
>>>>> routes_enp2s4="default via 192.168.11.10"
>>>>>
>>>>> */etc/conf.d/ctdb*
>>>>>
>>>>> CTDB_RECOVERY_LOCK=/amtb/.ctdb.lock
>>>>> CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
>>>>> CTDB_MANAGES_SAMBA=yes
>>>>> CTDB_SAMBA_SKIP_SHARE_CHECK=yes
>>>>> CTDB_NFS_SKIP_SHARE_CHECK=yes
>>>>> CTDB_MANAGES_WINBIND=yes
>>>>> CTDB_MANAGES_VSFTPD=no
>>>>> CTDB_MANAGES_ISCSI=no
>>>>> CTDB_MANAGES_NFS=no
>>>>> CTDB_MANAGES_HTTPD=no
>>>>> CTDB_SYSLOG=yes
>>>>> CTDB_DEBUGLEVEL=NOTICE
>>>>> CTDB_INIT_STYLE=
>>>>> CTDB_SERVICE_SMB=samba
>>>>> CTDB_SERVICE_WINBIND=winbind
>>>>> CTDB_NODES=/etc/ctdb/nodes
>>>>> CTDB_NOTIFY_SCRIPT=/etc/ctdb/notify.sh
>>>>> CTDB_DBDIR=/var/lib/ctdb
>>>>> CTDB_DBDIR_PERSISTENT=/var/lib/ctdb/persistent
>>>>> CTDB_EVENT_SCRIPT_DIR=/etc/ctdb/events.d
>>>>> CTDB_SOCKET=/var/lib/ctdb/ctdb.socket
>>>>> CTDB_TRANSPORT="tcp"
>>>>> CTDB_MONITOR_FREE_MEMORY=100
>>>>> CTDB_START_AS_DISABLED="yes"
>>>>> CTDB_CAPABILITY_RECMASTER=yes
>>>>> CTDB_CAPABILITY_LMASTER=yes
>>>>> NATGW_PUBLIC_IP=
>>>>> NATGW_PUBLIC_IFACE=
>>>>> NATGW_DEFAULT_GATEWAY=
>>>>> NATGW_PRIVATE_IFACE=
>>>>> NATGW_PRIVATE_NETWORK=
>>>>> NATGW_NODES=/etc/ctdb/natgw_nodes
>>>>> CTDB_LOGFILE=/var/log/messages
>>>>> CTDB_DEBUGLEVEL=2
>>>>> CTDB_OPTIONS=
>>>>>
>>>>> */etc/ctdb/nodes*
>>>>> 192.168.12.30
>>>>> 192.168.12.34
>>>>>
>>>>> */etc/ctdb/public_addresses*
>>>>>
>>>>> 192.168.11.29/24        enp2s2
>>>>> 192.168.11.33/24        enp2s2
>>>>> 192.168.11.30/24        enp2s3
>>>>> 192.168.11.34/24        enp2s3
>>>>> 192.168.11.31/24        enp2s4
>>>>> 192.168.11.35/24        enp2s4
>>>>>
>>>>>
>>>>> And my log
>>>>>
>>>>> 2014/08/27 03:00:13.846218 [set_recmode: 8939]: ctdb_recovery_lock: Got
>>>>> recovery lock on '/amtb/.ctdb.lock'
>>>>> 2014/08/27 03:00:13.846322 [set_recmode: 8939]: ERROR: recovery lock
>>>>> file
>>>>> /amtb/.ctdb.lock not locked when recovering!
>>>>> 2014/08/27 03:00:14.563975 [ 4411]: server/ctdb_monitor.c:293 in
>>>>> recovery.
>>>>> Wait one more second
>>>>> 2014/08/27 03:00:14.583565 [ 4411]: Freeze priority 1
>>>>> 2014/08/27 03:00:14.585431 [ 4411]: Freeze priority 2
>>>>> 2014/08/27 03:00:14.587089 [ 4411]: Freeze priority 3
>>>>> 2014/08/27 03:00:14.589524 [ 4411]: server/ctdb_recover.c:988
>>>>> startrecovery
>>>>> eventscript has been invoked
>>>>> 2014/08/27 03:00:14.845167 [ 4411]: server/ctdb_recover.c:612 Recovery
>>>>> mode
>>>>> set to NORMAL
>>>>> 2014/08/27 03:00:14.845236 [ 4411]: Thawing priority 1
>>>>> 2014/08/27 03:00:14.845254 [ 4411]: Release freeze handler for prio 1
>>>>> 2014/08/27 03:00:14.845295 [ 4411]: Thawing priority 2
>>>>> 2014/08/27 03:00:14.845311 [ 4411]: Release freeze handler for prio 2
>>>>> 2014/08/27 03:00:14.845342 [ 4411]: Thawing priority 3
>>>>> 2014/08/27 03:00:14.845357 [ 4411]: Release freeze handler for prio 3
>>>>> 2014/08/27 03:00:14.846202 [set_recmode: 9003]: ctdb_recovery_lock: Got
>>>>> recovery lock on '/amtb/.ctdb.lock'
>>>>> 2014/08/27 03:00:14.846325 [set_recmode: 9003]: ERROR: recovery lock
>>>>> file
>>>>> /amtb/.ctdb.lock not locked when recovering!
>>>>> 2014/08/27 03:00:15.564794 [ 4411]: server/ctdb_monitor.c:293 in
>>>>> recovery.
>>>>> Wait one more second
>>>>> 2014/08/27 03:00:15.584563 [ 4411]: Freeze priority 1
>>>>> 2014/08/27 03:00:15.586399 [ 4411]: Freeze priority 2
>>>>> 2014/08/27 03:00:15.587946 [ 4411]: Freeze priority 3
>>>>> 2014/08/27 03:00:15.589802 [ 4411]: server/ctdb_recover.c:988
>>>>> startrecovery
>>>>> eventscript has been invoked
>>>>> 2014/08/27 03:00:15.849306 [ 4411]: server/ctdb_recover.c:612 Recovery
>>>>> mode
>>>>> set to NORMAL
>>>>> 2014/08/27 03:00:15.849395 [ 4411]: Thawing priority 1
>>>>> 2014/08/27 03:00:15.849416 [ 4411]: Release freeze handler for prio 1
>>>>> 2014/08/27 03:00:15.849460 [ 4411]: Thawing priority 2
>>>>> 2014/08/27 03:00:15.849477 [ 4411]: Release freeze handler for prio 2
>>>>> 2014/08/27 03:00:15.849510 [ 4411]: Thawing priority 3
>>>>> 2014/08/27 03:00:15.849525 [ 4411]: Release freeze handler for prio 3
>>>>> 2014/08/27 03:00:15.850398 [set_recmode: 9073]: ctdb_recovery_lock: Got
>>>>> recovery lock on '/amtb/.ctdb.lock'
>>>>> 2014/08/27 03:00:15.850519 [set_recmode: 9073]: ERROR: recovery lock
>>>>> file
>>>>> /amtb/.ctdb.lock not locked when recovering!
>>>>> 2014/08/27 03:00:16.565054 [ 4411]: server/ctdb_monitor.c:293 in
>>>>> recovery.
>>>>> Wait one more second
>>>>> 2014/08/27 03:00:16.585483 [ 4411]: Freeze priority 1
>>>>> 2014/08/27 03:00:16.587275 [ 4411]: Freeze priority 2
>>>>> 2014/08/27 03:00:16.588665 [ 4411]: Freeze priority 3
>>>>> 2014/08/27 03:00:16.590517 [ 4411]: server/ctdb_recover.c:988
>>>>> startrecovery
>>>>> eventscript has been invoked
>>>>>
>>>>
>>>> This is not complete log. Do you have the logs from the beginning.
>>>> Also, what about the logs from the other node?  Without complete logs from
>>>> both the nodes, it's difficult to figure out what's going wrong.
>>>>
>>>> From the error it appears that the cluster file system is not
>>>> configured correctly and recovery lock file can be locked from both the
>>>> nodes simultaneously.
>>>>
>>>> Amitay.
>>>>
>>>
>>>
>>
>