[Samba] smbstatus hang with CTDB 2.5.4 and Samba 4.1.13
John P Janosik
jpjanosi at us.ibm.com
Wed Oct 29 11:48:09 MDT 2014
Can anyone help with some pointers to debug a problem with Samba and CTDB
with smbstatus traversing the connections tdb? I've got a new two node
cluster with Samba and CTDB on AIX. If I run smbstatus when the server
has much user activity it hangs and the node it was run on gets banned. I
see the following in the ctdb log:
2014/10/29 11:12:45.374580 [3932342]: server/ctdb_traverse.c:644 Traverse
cancelled by client disconnect for database:0x6b06a26d
2014/10/29 11:12:45.429592 [3932342]: Recovery daemon ping timeout. Count
: 0
2014/10/29 11:12:45.430655 [3932342]: Handling event took 195 seconds!
2014/10/29 11:12:45.452636 [3932342]: pnn 1 Invalid reqid 220668 in
ctdb_reply_control
2014/10/29 11:12:48.462334 [recoverd:6488266]: server/ctdb_recoverd.c:3990
Remote node:0 has different flags for node 1. It has 0x02 vs our 0x00
2014/10/29 11:12:48.462448 [recoverd:6488266]: Use flags 0x00 from local
recmaster node for cluster update of node 1 flags
2014/10/29 11:12:48.483362 [3932342]: Freeze priority 1
2014/10/29 11:12:58.574548 [3932342]:
/usr/smb_cluster/etc/ctdb/debug_locks.sh[23]: flock: not found
2014/10/29 11:13:08.569593 [3932342]:
/usr/smb_cluster/etc/ctdb/debug_locks.sh[23]: flock: not found
2014/10/29 11:13:18.604124 [3932342]:
/usr/smb_cluster/etc/ctdb/debug_locks.sh[23]: flock: not found
2014/10/29 11:13:28.638279 [3932342]:
/usr/smb_cluster/etc/ctdb/debug_locks.sh[23]: flock: not found
2014/10/29 11:13:38.653236 [3932342]:
/usr/smb_cluster/etc/ctdb/debug_locks.sh[23]: flock: not found
2014/10/29 11:13:48.454994 [3932342]: Recovery daemon ping timeout. Count
: 0
2014/10/29 11:13:48.484659 [recoverd:6488266]: ctdb_control error:
'ctdb_control timed out'
2014/10/29 11:13:48.484723 [recoverd:6488266]: ctdb_control error:
'ctdb_control timed out'
2014/10/29 11:13:48.484782 [recoverd:6488266]: Async operation failed with
ret=-1 res=-1 opcode=33
2014/10/29 11:13:48.484837 [recoverd:6488266]: Failed to freeze node 1
during recovery. Set it as ban culprit for 2 credits
2014/10/29 11:13:48.484890 [recoverd:6488266]: Async wait failed -
fail_count=1
2014/10/29 11:13:48.485047 [recoverd:6488266]: server/ctdb_recoverd.c:410
Unable to freeze nodes. Recovery failed.
2014/10/29 11:13:48.485095 [recoverd:6488266]: server/ctdb_recoverd.c:1881
Unable to set recovery mode to active on cluster
2014/10/29 11:13:48.503273 [3932342]: Freeze priority 1
2014/10/29 11:13:48.688292 [3932342]:
/usr/smb_cluster/etc/ctdb/debug_locks.sh[23]: flock: not found
2014/10/29 11:13:58.704454 [3932342]:
/usr/smb_cluster/etc/ctdb/debug_locks.sh[23]: flock: not found
2014/10/29 11:14:08.739136 [3932342]:
/usr/smb_cluster/etc/ctdb/debug_locks.sh[23]: flock: not found
2014/10/29 11:14:18.753675 [3932342]:
/usr/smb_cluster/etc/ctdb/debug_locks.sh[23]: flock: not found
2014/10/29 11:14:28.788835 [3932342]:
/usr/smb_cluster/etc/ctdb/debug_locks.sh[23]: flock: not found
2014/10/29 11:14:38.824317 [3932342]:
/usr/smb_cluster/etc/ctdb/debug_locks.sh[23]: flock: not found
2014/10/29 11:14:48.486128 [3932342]: Recovery daemon ping timeout. Count
: 0
2014/10/29 11:14:48.504242 [recoverd:6488266]: ctdb_control error:
'ctdb_control timed out'
2014/10/29 11:14:48.504298 [recoverd:6488266]: ctdb_control error:
'ctdb_control timed out'
2014/10/29 11:14:48.504357 [recoverd:6488266]: Async operation failed with
ret=-1 res=-1 opcode=33
2014/10/29 11:14:48.504411 [recoverd:6488266]: Failed to freeze node 1
during recovery. Set it as ban culprit for 2 credits
2014/10/29 11:14:48.504463 [recoverd:6488266]: Async wait failed -
fail_count=1
2014/10/29 11:14:48.504507 [recoverd:6488266]: server/ctdb_recoverd.c:410
Unable to freeze nodes. Recovery failed.
2014/10/29 11:14:48.504552 [recoverd:6488266]: server/ctdb_recoverd.c:1881
Unable to set recovery mode to active on cluster
2014/10/29 11:14:48.506682 [3932342]: Banning this node for 300 seconds
The tdb mentioned in the first log message is:
ctdb getdbmap | grep 0x6b06a26d
dbid:0x6b06a26d name:smbXsrv_session_global.tdb
path:/usr/smb_cluster/var/lib/ctdb/smbXsrv_session_global.tdb.1
I was going to start by trying to update
/usr/smb_cluster/etc/ctdb/debug_locks.sh since I don't think flock is
available on AIX. I'm hoping the ctdb log will then contain some more
help debugging the problem. Any other ideas on how to debug this?
Thanks,
John Janosik
jpjanosi at us.ibm.com
More information about the samba
mailing list