CTDB asymetric (non-)recovery
nicolas at ecarnot.net
Thu Jun 7 04:22:49 MDT 2012
Le 07/06/2012 08:32, Martin Schwenke a écrit :
> On Wed, 06 Jun 2012 23:02:57 +0200, Nicolas Ecarnot
> <nicolas at ecarnot.net> wrote:
>> I have to add : when being in the infinite cycle of death (node0 unable
>> to recover), stopping ctdb on node1 leads to node0 recovering well.
> Yikes, so it really is stuck in recovery. I'm not sure how to debug
> that... :-(
>> Martin : Does your question suggest the issue lies in the scripting part?
> No, I was just being lazy and making sure that the issue isn't in the
> scripting part... that's where many similar looking issues are...
> peace& happiness,
I increased the log level to 9 (damn, this IS verbose), and I try to
extract the relevant part of the loop, on the failing node (though yet
nothing is proving me that the unhealthy node _is_ the faulty one).
The log file is here : http://pastebin.com/YEwrkmPx
Here are some points I have to add because I must make this cluster work
: On each node, I'm using bonding on two interfaces, but I'm using this
same bond0 interface for public and private (intra-cluster) communication.
I know this is sub-optimal, but (obviously) I've no other choice.
Continuing my tests, I saw today that this non recovery problem is not
asymetric : I manage to get the same issue one node0.
I'm speaking about network because I'm heading to network related issue,
as I'm seeing strange things:
When a node gets loop-stucked, it displays
"The interfaces status has changed on local node X...
From times to times (rare but happens), in looping situation, 'ping'
keeps working but ssh does not anymore (though I have a working
pass-free ssh setup).
Well, I'd be glad a coder could explain to me what ctdb does to
interfaces : what actions, and what monitoring?
Tests I could do :
- change bonding mode
- according to your answer : change timings/waiting values (for this,
I'm a bit lost because there are numerous values I could play with)
Tests that would be near to impossible for me :
- use dedicated interfaces for private network
More information about the samba-technical