ctdb client hangs starting a transaction on a persistent database after another node's crash

Mon Feb 4 03:42:35 MST 2013

Hello,

I have encountered exactly the same problem  as in
https://lists.samba.org/archive/samba-technical/2012-November/088685.htmland
tried to debug it.

My cluster consists of 2 nodes. node1 writes records to a persistent
database several times (using ctdb pstore). Then I'm killing a ctdbd daemon
on this node and try to write some other record to this database on the
other node (node2), which results in a hang.

Below is the call stack of ctdb client during the hang.

#0  __epoll_wait_nocancel ()
#1  epoll_event_loop  at lib/tevent/tevent_standard.c:282
#2  std_event_loop_once "client/ctdb_client.c:335") at
lib/tevent/tevent_standard.c:567
#3  _tevent_loop_once "client/ctdb_client.c:335") at lib/tevent/tevent.c:506
#4  ctdb_call_recv  at client/ctdb_client.c:335
#5  ctdb_call at client/ctdb_client.c:501
#6  ctdb_client_force_migration at client/ctdb_client.c:597
#7  ctdb_fetch_lock at client/ctdb_client.c:705
#8  ctdb_transaction_fetch_start at client/ctdb_client.c:3704
#9  ctdb_transaction_start at client/ctdb_client.c:3797
#10 control_pstore at tools/ctdb.c:3907
#11 main

The client is starting a transaction. It is fetching a __transaction_lock__
record from the local TDB and sees that the DMASTER in its header is
different from the current node number. As a result, the client code sends
a request to the daemon to make us DMASTER (ctdb_client_force migration).
The daemon receives this request and sees that the DMASTER for this record
is not the current node. Then the daemon queues this request to be sent to
node2 when it becomes available. The problem is that the client doesn't
return until the other node is up.

Whenever a node starts a transaction on a persistent database, the DMASTER
of the transaction lock record is set this node's pnn.

It looks rather strange to me that we are trying to migrate a record from a
persistent database from another node. As I understand, each healthy node
in the cluster has a full up-to-date copy of each persistent database.

I have several questions:
- Do I understand correctly that DMASTER should be completely ignored for
persistent databases?
- Are there any cases when checking a DMASTER for a persistent database
record could be valid?
- Is there any fix/workaround for this?

Thanks,
Maxim