[Samba] Failed to start CTDB first time after install

Chuck Reinke reinke at me.com
Tue Apr 9 13:23:43 MDT 2013


Hi,

I am setting up a two node Samba cluster with CTDB in AWS in two different subnets.  All IP ports are open between these two subnets.  I am initially forming the Samba cluster with one node, then will add the second node after startup of CTDB.  I am not using public_addresses for CTDB due to AWS not supporting VIP's.  I am using 64bit Amazon Linux with two NICs defined, eth0 as the primary NIC, eth1 as the private IP NIC.  With clustering off and no CTDB, Samba works.  I need to get this running for a needed project.  Only errors are reported in /var/log/log.ctdb.  Please help.

CTDB Configs:

Edit /etc/sysconfig/ctdb for the following to change from default.
	CTDB_RECOVERY_LOCK="/samba/samba_lock"
	CTDB_NODES=/etc/ctdb/nodes
	CTDB_DEBUGLEVEL=3


Edited /etc/ctdb/nodes to add internal Ip address for eth1 for private IP.



The complete /var/log/log.ctdb:

2013/04/09 16:09:59.881679 [30574]: CTDB starting on node
2013/04/09 16:09:59.886133 [30575]: Starting CTDBD as pid : 30575
2013/04/09 16:09:59.886305 [30575]: Set scheduler to SCHED_FIFO
2013/04/09 16:09:59.886637 [30575]: ctdb chose network address 10.22.1.20:4379 pnn 0
2013/04/09 16:09:59.887035 [30575]: server/eventscript.c:800 Starting eventscript init 
2013/04/09 16:09:59.969022 [30575]: 10.interface: No public addresses file found. Nothing to do for 10.interfaces
2013/04/09 16:10:00.246654 [30575]: server/eventscript.c:486 Eventscript init  finished with state 0
2013/04/09 16:10:00.248978 [30575]: Keepalive monitoring has been started
2013/04/09 16:10:00.249024 [30575]: Monitoring has been started
2013/04/09 16:10:00.249057 [30575]: server/eventscript.c:800 Starting eventscript setup 
2013/04/09 16:10:00.249415 [recoverd:30648]: monitor_cluster starting
2013/04/09 16:10:00.251621 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17870283321406128128
2013/04/09 16:10:00.251760 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17870564796382838784
2013/04/09 16:10:00.251858 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17870846271359549440
2013/04/09 16:10:00.251952 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17365880163140632576
2013/04/09 16:10:00.252050 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17582052945254416384
2013/04/09 16:10:00.252150 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17942340915444056064
2013/04/09 16:10:00.252243 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17798225727368200192
2013/04/09 16:10:00.252332 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=18014398509481984000
2013/04/09 16:10:00.252422 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=18086456103519911936
2013/04/09 16:10:00.252511 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=18087019053473333248
2013/04/09 16:10:00.252600 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=18158513697557839872
2013/04/09 16:10:00.252688 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=17654110539292344320
2013/04/09 16:10:00.252776 [30575]: server/ctdb_daemon.c:182 Registered message handler for srvid=18086737578496622592
2013/04/09 16:10:00.253577 [recoverd:30648]: server/ctdb_recoverd.c:3415 Initial recovery master set - forcing election
2013/04/09 16:10:00.253609 [recoverd:30648]: server/ctdb_recoverd.c:2521 Force an election
2013/04/09 16:10:00.253673 [30575]: Freeze priority 1
2013/04/09 16:10:00.253783 [30575]: Freeze priority 2
2013/04/09 16:10:00.253901 [30575]: Freeze priority 3
2013/04/09 16:10:00.254181 [recoverd:30648]: server/ctdb_recoverd.c:2005 Send election request to all active nodes
2013/04/09 16:10:01.249677 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:02.249961 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:03.250141 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:03.257560 [recoverd:30648]: server/ctdb_recoverd.c:1055 Election timed out
2013/04/09 16:10:03.258563 [recoverd:30648]: The interfaces status has changed on local node 0 - force takeover run
2013/04/09 16:10:03.258805 [recoverd:30648]: Trigger takeoverrun
2013/04/09 16:10:03.259041 [recoverd:30648]: server/ctdb_recoverd.c:2702 Node:0 was in recovery mode. Restart recovery process
2013/04/09 16:10:03.259071 [recoverd:30648]: server/ctdb_recoverd.c:1555 Starting do_recovery
2013/04/09 16:10:03.259085 [recoverd:30648]: Taking out recovery lock from recovery daemon
2013/04/09 16:10:03.259108 [recoverd:30648]: Take the recovery lock
2013/04/09 16:10:03.267903 [recoverd:30648]: Recovery lock taken successfully
2013/04/09 16:10:03.267933 [recoverd:30648]: ctdb_recovery_lock: Got recovery lock on '/mnt/prod-assets/samba/samba_lock'
2013/04/09 16:10:03.268052 [recoverd:30648]: Recovery lock taken successfully by recovery daemon
2013/04/09 16:10:03.268071 [recoverd:30648]: server/ctdb_recoverd.c:1592 Recovery initiated due to problem with node 0
2013/04/09 16:10:03.268190 [recoverd:30648]: server/ctdb_recoverd.c:1617 Recovery - created remote databases
2013/04/09 16:10:03.268211 [recoverd:30648]: server/ctdb_recoverd.c:1624 Recovery - updated db priority for all databases
2013/04/09 16:10:03.268351 [30575]: Freeze priority 1
2013/04/09 16:10:03.268455 [30575]: Freeze priority 2
2013/04/09 16:10:03.268552 [30575]: Freeze priority 3
2013/04/09 16:10:03.268723 [30575]: server/ctdb_recover.c:1035 startrecovery eventscript has been invoked
2013/04/09 16:10:03.268744 [30575]: Monitoring has been disabled
2013/04/09 16:10:03.268763 [30575]: server/eventscript.c:800 Starting eventscript startrecovery 
2013/04/09 16:10:03.617562 [30575]: server/eventscript.c:486 Eventscript startrecovery  finished with state 0
2013/04/09 16:10:03.618061 [30575]: Control modflags on node 0 - Unchanged - flags 0x2
2013/04/09 16:10:03.618127 [recoverd:30648]: server/ctdb_recoverd.c:1661 Recovery - updated flags
2013/04/09 16:10:03.618311 [recoverd:30648]: server/ctdb_recoverd.c:1705 started transactions on all nodes
2013/04/09 16:10:03.618333 [recoverd:30648]: server/ctdb_recoverd.c:1718 Recovery - starting database commits
2013/04/09 16:10:03.618389 [30575]: server/ctdb_freeze.c:408 healthy_nodes[0]
2013/04/09 16:10:03.618450 [recoverd:30648]: server/ctdb_recoverd.c:1730 Recovery - committed databases
2013/04/09 16:10:03.618621 [recoverd:30648]: server/ctdb_recoverd.c:1780 Recovery - updated vnnmap
2013/04/09 16:10:03.618717 [recoverd:30648]: server/ctdb_recoverd.c:1789 Recovery - updated recmaster
2013/04/09 16:10:03.618916 [30575]: Control modflags on node 0 - Unchanged - flags 0x2
2013/04/09 16:10:03.618973 [recoverd:30648]: server/ctdb_recoverd.c:1806 Recovery - updated flags
2013/04/09 16:10:03.619034 [30575]: server/ctdb_recover.c:665 Recovery mode set to NORMAL
2013/04/09 16:10:03.619053 [30575]: Thawing priority 1
2013/04/09 16:10:03.619066 [30575]: Release freeze handler for prio 1
2013/04/09 16:10:03.619110 [30575]: Thawing priority 2
2013/04/09 16:10:03.619126 [30575]: Release freeze handler for prio 2
2013/04/09 16:10:03.619150 [30575]: Thawing priority 3
2013/04/09 16:10:03.619164 [30575]: Release freeze handler for prio 3
2013/04/09 16:10:03.622723 [recoverd:30648]: server/ctdb_recoverd.c:1815 Recovery - disabled recovery mode
2013/04/09 16:10:03.623218 [recoverd:30648]: Disabling ip check for 9 seconds
2013/04/09 16:10:03.623228 [30575]: Running eventscripts with arguments ipreallocated
2013/04/09 16:10:03.623260 [30575]: Monitoring has been disabled
2013/04/09 16:10:03.623283 [30575]: server/eventscript.c:800 Starting eventscript ipreallocated 
2013/04/09 16:10:03.971720 [30575]: server/eventscript.c:486 Eventscript ipreallocated  finished with state 0
2013/04/09 16:10:03.971788 [30575]: Monitoring has been enabled
2013/04/09 16:10:03.972044 [30575]: Recovery has finished
2013/04/09 16:10:03.972067 [30575]: Monitoring has been disabled
2013/04/09 16:10:03.972083 [30575]: server/eventscript.c:800 Starting eventscript recovered 
2013/04/09 16:10:04.250561 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:04.250613 [30575]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second.
2013/04/09 16:10:04.322804 [30575]: server/eventscript.c:486 Eventscript recovered  finished with state 0
2013/04/09 16:10:04.322870 [30575]: Monitoring has been enabled
2013/04/09 16:10:04.322983 [recoverd:30648]: server/ctdb_recoverd.c:1841 Recovery - finished the recovered event
2013/04/09 16:10:04.323022 [recoverd:30648]: server/ctdb_recoverd.c:1847 Recovery complete
2013/04/09 16:10:04.323038 [recoverd:30648]: Resetting ban count to 0 for all nodes
2013/04/09 16:10:04.323057 [recoverd:30648]: Just finished a recovery. New recoveries will now be supressed for the rerecovery timeout (10 seconds)
2013/04/09 16:10:05.251440 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:05.251473 [30575]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second.
2013/04/09 16:10:06.251582 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:06.251634 [30575]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second.
2013/04/09 16:10:07.251744 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:07.251775 [30575]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second.
2013/04/09 16:10:08.251886 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:08.251925 [30575]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second.
2013/04/09 16:10:09.252062 [30575]: CTDB_WAIT_UNTIL_RECOVERED
2013/04/09 16:10:09.252117 [30575]: server/ctdb_monitor.c:261 wait for pending recoveries to end. Wait one more second.


Thanks for any help.
Chuck


reinke at mac.com









More information about the samba mailing list