[PATCH] Avoid CTDB daemon deadlock while reading db sequence number (bug 13021)

Mon Sep 11 15:14:18 UTC 2017

On Mon, Sep 11, 2017 at 8:13 PM, Volker Lendecke <Volker.Lendecke at sernet.de>
wrote:

> On Mon, Sep 11, 2017 at 01:59:02PM +1000, Amitay Isaacs via
> samba-technical wrote:
> > Once the recovery starts and databases are frozen, then all the record
> > access is postponed till the recovery is complete except reading the
> > database sequence number.  Database access for reading sequence number
> > is done via a control which does not check if the databases are frozen
> > or not.
>
> Doesn't this depends on the lock helper process to go away in time
> when being asked to? Chouldn't we also do a tdb_chainlock_nonblock in
> the parent to avoid any problems with races?
>
>
CTDB daemon uses tdb_chainlock_nonblock() when it's trying to migrate
records in the
processing of ctdb_req_call.  However, there are few places where ctdb
fetches a
record when it expects to be able to get record lock.  One of such places
is reading
the sequence number.

In principle I do agree that any record locks in ctdb daemon should use
non-blocking version. I would not like to make a sweeping change for
all record locks since we don't have sufficient tests for the ctdb daemon.

I will keep this in mind when splitting the database daemon.

Amitay.