ctdb-2.3-3.1 on SLES11 SP2: Timed out waiting for initialisation

Rainer Krienke krienke at uni-koblenz.de
Tue Jul 16 00:49:30 MDT 2013


Hallo,

I have a running ctdb cluster with 6 nodes based on a OCFS2 Cluster FS.
Until now I am still running ctdb 1.2.52 which works without problems.

Yesterday I tried to upgrade ctdb to version 2.3 on one node. However
when I start ctdb on this node using the same settings as for version
1.2  ctdb startes and everything looks fine buth then after exactly 10
sec it says

Timed out waiting for initialisation - check logs - killing CTDB

However I cannot find anything really helpful in there. So I looked what
timeout value is set to 10 sec on one of the other nodes:

sambanode1:~ # ctdb listvars|grep -i timeout
ControlTimeout      = 60
TraverseTimeout     = 20
RecoverTimeout      = 20
ElectionTimeout     = 3
TakeoverTimeout     = 9
EventScriptTimeout  = 30
EventScriptTimeoutCount = 1
EventScriptUnhealthyOnTimeout = 0
RerecoveryTimeout   = 10
RecdPingTimeout     = 60

and then increased the variables in /etc/sysconfig/ctdb on the node
running ctdb 2.3:

CTDB_SET_RerecoveryTimeout=15
CTDB_SET_TakeoverTimeout=14

However the behaviour did not change. After 10sec I get the error. Below
is the contents of log.ctdb when starting the ctdb v2.3 on the node in
question.

Any ideas?

Thanks
Rainer

log.ctdb
---------------
2013/07/16 08:29:00.206512 [19648]: CTDB starting on node
2013/07/16 08:29:00.211208 [19649]: Starting CTDBD (Version 2.3) as PID:
19649
2013/07/16 08:29:00.211471 [19649]: Created PID file /var/run/ctdb/ctdbd.pid
2013/07/16 08:29:00.211549 [19649]: Set scheduler to SCHED_FIFO
2013/07/16 08:29:00.211784 [19649]: Set runstate to INIT (1)
2013/07/16 08:29:01.396760 [19649]: Vacuuming is disabled for persistent
database secrets.tdb
2013/07/16 08:29:01.396825 [19649]: Attached to database
'/var/lib/ctdb/persistent/secrets.tdb.5' with flags 0x400
2013/07/16 08:29:01.396857 [19649]: Ignoring persistent database
'idmap2.tdb.0'
2013/07/16 08:29:01.423476 [19649]: Vacuuming is disabled for persistent
database idmap2.tdb
2013/07/16 08:29:01.423520 [19649]: Attached to database
'/var/lib/ctdb/persistent/idmap2.tdb.5' with flags 0x400
2013/07/16 08:29:01.453077 [19649]: Vacuuming is disabled for persistent
database account_policy.tdb
2013/07/16 08:29:01.453140 [19649]: Attached to database
'/var/lib/ctdb/persistent/account_policy.tdb.5' with flags 0x400
2013/07/16 08:29:01.453164 [19649]: Ignoring persistent database
'secrets.tdb.0'
2013/07/16 08:29:01.453177 [19649]: Ignoring persistent database
'registry.tdb.0'
2013/07/16 08:29:01.453188 [19649]: Ignoring persistent database
'share_info.tdb.0'
2013/07/16 08:29:01.477533 [19649]: Vacuuming is disabled for persistent
database group_mapping.tdb
2013/07/16 08:29:01.477641 [19649]: Attached to database
'/var/lib/ctdb/persistent/group_mapping.tdb.5' with flags 0x400
2013/07/16 08:29:01.477677 [19649]: Ignoring persistent database
'account_policy.tdb.0'
2013/07/16 08:29:01.477689 [19649]: Ignoring persistent database
'group_mapping.tdb.0'
2013/07/16 08:29:01.502193 [19649]: Vacuuming is disabled for persistent
database registry.tdb
2013/07/16 08:29:01.502248 [19649]: Attached to database
'/var/lib/ctdb/persistent/registry.tdb.5' with flags 0x400
2013/07/16 08:29:01.526454 [19649]: Vacuuming is disabled for persistent
database share_info.tdb
2013/07/16 08:29:01.526510 [19649]: Attached to database
'/var/lib/ctdb/persistent/share_info.tdb.5' with flags 0x400
2013/07/16 08:29:01.550530 [19649]: Vacuuming is disabled for persistent
database passdb.tdb
2013/07/16 08:29:01.550587 [19649]: Attached to database
'/var/lib/ctdb/persistent/passdb.tdb.5' with flags 0x400
2013/07/16 08:29:01.550613 [19649]: Ignoring persistent database
'passdb.tdb.0'
2013/07/16 08:29:01.550664 [19649]: Freeze priority 1
2013/07/16 08:29:01.555316 [19649]: Freeze priority 2
2013/07/16 08:29:01.558846 [19649]: Freeze priority 3
2013/07/16 08:29:01.562498 [19649]: server/ctdb_takeover.c:3239 Released
0 public IPs
2013/07/16 08:29:01.562535 [19649]: Set runstate to SETUP (2)
2013/07/16 08:29:01.607320 [19649]: 00.ctdb: Set RerecoveryTimeout to 15
2013/07/16 08:29:01.620550 [19649]: 00.ctdb: Set TakeoverTimeout to 14
2013/07/16 08:29:02.035237 [19649]: Set runstate to FIRST_RECOVERY (3)
2013/07/16 08:29:02.036510 [19649]: Keepalive monitoring has been started
2013/07/16 08:29:02.036559 [19649]: Monitoring has been started
2013/07/16 08:29:02.037576 [recoverd:19925]: monitor_cluster starting
2013/07/16 08:29:02.039491 [recoverd:19925]: server/ctdb_recoverd.c:3483
Initial recovery master set - forcing election
2013/07/16 08:29:02.039571 [19649]: Freeze priority 1
2013/07/16 08:29:02.039659 [19649]: Freeze priority 2
2013/07/16 08:29:02.039739 [19649]: Freeze priority 3
2013/07/16 08:29:02.040051 [19649]: This node (5) is now the recovery master
2013/07/16 08:29:02.541475 [19649]: This node (5) is no longer the
recovery master
2013/07/16 08:29:03.036956 [19649]: CTDB_WAIT_UNTIL_RECOVERED
2013/07/16 08:29:03.811746 [19649]: server/ctdb_recover.c:1035 Unknown
db 0x7a19d84d
2013/07/16 08:29:04.037953 [19649]: CTDB_WAIT_UNTIL_RECOVERED
2013/07/16 08:29:04.197096 [19649]: server/ctdb_recover.c:1035 Unknown
db 0x7a19d84d
2013/07/16 08:29:05.038701 [19649]: CTDB_WAIT_UNTIL_RECOVERED
2013/07/16 08:29:06.039045 [19649]: CTDB_WAIT_UNTIL_RECOVERED
2013/07/16 08:29:07.039505 [19649]: CTDB_WAIT_UNTIL_RECOVERED
2013/07/16 08:29:07.363554 [19649]: server/ctdb_recover.c:1035 Unknown
db 0x7a19d84d
2013/07/16 08:29:07.550620 [recoverd:19925]: server/ctdb_recoverd.c:1061
Election timed out
2013/07/16 08:29:07.552259 [recoverd:19925]: The interfaces status has
changed on local node 5 - force takeover run
2013/07/16 08:29:07.552596 [recoverd:19925]: Trigger takeoverrun
2013/07/16 08:29:07.637809 [19649]: server/eventscript.c:983 Aborted
running eventscript "ipreallocated" while in RECOVERY mode
2013/07/16 08:29:08.007195 [19649]: server/ctdb_recover.c:1035 Unknown
db 0x7a19d84d
2013/07/16 08:29:08.040415 [19649]: CTDB_WAIT_UNTIL_RECOVERED
2013/07/16 08:29:08.180484 [19649]: Could not find idr:2147483449
2013/07/16 08:29:08.184761 [19649]: Attached to database
'/var/lib/ctdb/tmpfs-local/notify_onelevel.tdb.5' with flags 0x441
2013/07/16 08:29:08.195427 [19649]: Could not find idr:2147483449
2013/07/16 08:29:08.199655 [19649]: Attached to database
'/var/lib/ctdb/tmpfs-local/notify.tdb.5' with flags 0x441
2013/07/16 08:29:08.201655 [19649]: Could not find idr:2147483449
2013/07/16 08:29:08.204742 [19649]: Attached to database
'/var/lib/ctdb/tmpfs-local/printer_list.tdb.5' with flags 0x441
2013/07/16 08:29:08.206495 [19649]: Could not find idr:2147483449
2013/07/16 08:29:08.209989 [19649]: Attached to database
'/var/lib/ctdb/tmpfs-local/locking.tdb.5' with flags 0x441
2013/07/16 08:29:08.211756 [19649]: Could not find idr:2147483449
2013/07/16 08:29:08.214723 [19649]: Attached to database
'/var/lib/ctdb/tmpfs-local/brlock.tdb.5' with flags 0x441
2013/07/16 08:29:08.216429 [19649]: Could not find idr:2147483449
2013/07/16 08:29:08.219679 [19649]: Attached to database
'/var/lib/ctdb/tmpfs-local/connections.tdb.5' with flags 0x441
2013/07/16 08:29:08.221475 [19649]: Could not find idr:2147483449
2013/07/16 08:29:08.224502 [19649]: Attached to database
'/var/lib/ctdb/tmpfs-local/sessionid.tdb.5' with flags 0x441
2013/07/16 08:29:08.226357 [19649]: Could not find idr:2147483449
2013/07/16 08:29:08.229274 [19649]: Attached to database
'/var/lib/ctdb/tmpfs-local/g_lock.tdb.5' with flags 0x441
2013/07/16 08:29:08.231192 [19649]: Could not find idr:2147483449
2013/07/16 08:29:08.234141 [19649]: Attached to database
'/var/lib/ctdb/tmpfs-local/serverid.tdb.5' with flags 0x441
2013/07/16 08:29:08.254779 [19649]: Freeze priority 1
2013/07/16 08:29:08.260410 [19649]: Freeze priority 2
2013/07/16 08:29:08.264312 [19649]: Freeze priority 3
2013/07/16 08:29:08.269375 [19649]: server/ctdb_recover.c:989
startrecovery eventscript has been invoked
2013/07/16 08:29:08.722733 [recoverd:19925]: Node 0 has changed flags -
now 0x0  was 0x2
2013/07/16 08:29:08.723519 [recoverd:19925]: Node 1 has changed flags -
now 0x0  was 0x2
2013/07/16 08:29:08.724481 [recoverd:19925]: Node 2 has changed flags -
now 0x0  was 0x2
2013/07/16 08:29:08.725122 [recoverd:19925]: Node 3 has changed flags -
now 0x0  was 0x2
2013/07/16 08:29:08.725844 [recoverd:19925]: Node 4 has changed flags -
now 0x0  was 0x2
2013/07/16 08:29:09.041042 [19649]: CTDB_WAIT_UNTIL_RECOVERED
2013/07/16 08:29:10.041605 [19649]: CTDB_WAIT_UNTIL_RECOVERED
-- 
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, http://userpages.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html,Fax: +49261287
1001312

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4697 bytes
Desc: S/MIME Kryptografische Unterschrift
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20130716/0cefd7eb/attachment.bin>


More information about the samba-technical mailing list