CTDB - all nodes "unhealthy"

Fri Mar 23 05:10:30 MDT 2012

On 23/03/12 10:16, ronnie sahlberg wrote:
> Ok so it doesnt even startup properly, so it fails in the startup
> event or the initial recovery.
>
> Shut down completely and delete the /var/log/log.ctdb files so we get
> a clean trace
> and restart ctdb on all nodes.
> Then post the log.ctdb file after it has been running for about 3 minutes
>

I'm such a n00b at this!

I did as you suggested - cleared down the logs and restarted ctdb. When 
I looked after a few minutes, I found "ctdb_control error: 'managed to 
lock reclock file from inside daemon'" in the ctdb log, which I hadn't 
seen before; when I looked that up, I found a list post that says (in part):

"Ok, you have a problem with the posix fcntl byte range lock support on your
file system"

GFS2 is *supposed* to be posix(ish) compliant, isn't it. So I checked my GFS mounts... and found I didn't have any. :-( D'oh!

My GFS2 clvm volumes had become "unavailable", and I needed to lvchange 
them.

So, guess what?

[root at ctdb-samba01 ~]# ctdb status
Number of nodes:4
pnn:0 172.16.6.180     OK (THIS NODE)
pnn:1 172.16.6.181     OK
pnn:2 172.16.6.182     OK
pnn:3 172.16.6.183     OK
Generation:181354777
Size:4
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
hash:3 lmaster:3
Recovery mode:NORMAL (0)
Recovery master:0

Wahey!

Edging, slowly up the ctdb learning curve...
-Andy

-- 
Andy D'Arcy Jewell

SysMicro Limited
Linux Support