Some questions about ctdb availability when some node is crashed

Fri Nov 16 09:02:00 MST 2012

Dear martin,

Thanks for your reply!

In my tests, the IPs hosted by the failed node was moved to other nodes
correctly.
But all records in test.tdb can not be fetched on all nodes, i.e. {node-0,
node-1, node-2, node-3}.
It seems hanging when doing ctdb_fetch_lock:force_migration
when node-0, node-2, node-3 tries to fetch the record from test.tdb.
Is there any mechanism to revoke __transaction_lock__?

Thanks again. I really appreciate receiving any input from smart people
like you guys.

az

2012/11/16 Martin Schwenke <martin at meltin.net>

> On Sun, 11 Nov 2012 15:23:02 +0800, XW Huang <xwhuang123 at gmail.com>
> wrote:
>
> > A record (/tmp/test) is stored in a persistent tdb (test.tdb) by this
> > command.
> >
> > $sudo ctdb pstore test.tdb /tmp/test /tmp/test
> >
> > This record is fetched again and again on node-1.
> > Then, node-1's network interface was turned down to simulate it is
> crashed
> > by this command.
> >
> > $sudo ifconfig eth0 down
> > [...]
>
> > [...]
> > However, the records in test.tdb can not be fetched anymore.
> > But "ctdb getdbstatus" says test.tdb is healthy
> > and "ctdb catdb" can dump its content:
> > [...]
>
> It sounds like you are still trying to fetch the record on node 1.
> That won't work.  CTDB isn't available on that node.
>
> If you are providing a CIFS service using Samba/CTDB then when a node
> fails, the IPs hosted by the failed node will be moved to other nodes
> so that they can continue to provide the CIFS service. Then any data
> that Samba needs from CTDB can be fetched on those other nodes.
>
> Does that help?  :-)
>
> peace & happiness,
> martin
>