TDB lock contention during "startup" event caused winbind crash
Martin Schwenke
martin at meltin.net
Mon Jun 6 02:03:21 UTC 2016
Hi Kenny,
On Tue, 31 May 2016 11:38:19 -0700, Kenny Dinh <kdinh at peaxy.net> wrote:
> This issue occurred one time when the system was under load. I have not
> seen it again. There were many other issues with the system before it got
> to this point. The file system was slow to respond the recovery lock
> requests, as you have noticed.
>
> I will definitely let you know if I encounter this issue again on the 4.4.x
> branch. It is highly unlikely though.
>
> Thank you for the pointer to the parallel recovery helper code.
>
> As for your question on what caused the cluster to go into recovery at
> "2015/11/18
> 08:13:52.078635". My setup has 3 CTDB nodes. CTDB services on all 3 nodes
> were restarted at around 8:01. CTDB processes on all 3 CTDB nodes were
> stuck at the "startup" stage. At 08:13:52, node_1 was not able to receive
> "keep_alive" messages from node_0 and declared that it was dead, which put
> the cluster into recovery. Again, there were all kinds of other issues
> with the cluster at that time.
>
> I don't think we should spend more time on this issue. Attached are the
> logs from 3 ctdb nodes, if you are curious.
I took a very quick look, but there are no obvious, identifiable
problems apart from the ones I have already mentioned.
I hope things work better with 4.4.x. :-)
peace & happiness,
martin
More information about the samba-technical
mailing list