Setting up CTDB on OCFS2 and VMs ...

Tue Dec 16 11:22:02 MST 2014

On 16/12/14 17:38, Stefan Kania wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Rowland,
>
>
> Am 16.12.14 um 15:27 schrieb Rowland Penny:
>> On 16/12/14 13:12, Stefan Kania wrote: Hi Rowland,
>>
>> If these addresses should be your IPs vor Clients accessing the
>> Cluster, you must put tese IPs in your public_addresses file
>>
>>> OK, they are in the public_addresses file
>> and in your DNS the hostname vor your cluster should point to both
>> addresses.
>>
>>> DOH! CNAME.
>> Remeber that you have ti install "ethtool" on all nodes!
>>
>>> What do mean remember? I haven't seen it anywhere that you must
>>> install ethtool, It is installed anyway :-D
> I had an error-massage on the node without ethtool, and the nodes were
> unhealty. After I installaed ethtool it worked for me and the
> error-message was gone.
>> So if you start the cluster the system will pick an IP-Address out
>> of the file. If there is no public_addresses file your system will
>> not get any IP. If there is no ethtool but a public_address file,
>> the node cant set on of the IPs. If one node of your clusters fails
>> the second not will get the IP address from the failed host. BUT
>> REMEBER you wount see the IPs with "ifconfig" you MUST use "ip a
>> l".
>>
>>> That is something else that I haven't seen anywhere! :-)
> Read this:
> http://unix.stackexchange.com/questions/93412/difference-between-ifconfig-and-ip-commands
>
> What I think ist that CTDB is assaigning the virtual IP over "ip" and
> configuring the NIC with ethtool. So if you are setting an IP with the
> "ip" command, this address is not shown with "ifconfig"
>
> Did you get rid of the IP-Errormessage?

It would seem so, the last time it appeared in log.ctdb was here:

2014/12/16 14:48:41.784044 [recoverd:13666]: Takeover run starting
2014/12/16 14:48:41.784284 [recoverd:13666]: Failed to find node to 
cover ip 192.168.0.9
2014/12/16 14:48:41.784305 [recoverd:13666]: Failed to find node to 
cover ip 192.168.0.8
2014/12/16 14:48:41.850344 [recoverd:13666]: Takeover run completed 
successfully

A short while later there is this:

2014/12/16 14:52:34.242911 [recoverd:13666]: Takeover run starting
2014/12/16 14:52:34.243356 [13513]: Takeover of IP 192.168.0.9/8 on 
interface eth0
2014/12/16 14:52:34.261916 [13513]: Takeover of IP 192.168.0.8/8 on 
interface eth0
2014/12/16 14:52:34.490010 [recoverd:13666]: Takeover run completed 
successfully

The ipaddresses never appear again.

>
> I think that's your main problem.

I dont think so, tailing the log shows this:

root at cluster1:~# tail /var/log/ctdb/log.ctdb
2014/12/16 18:11:23.866612 [13513]: Thawing priority 2
2014/12/16 18:11:23.866634 [13513]: Release freeze handler for prio 2
2014/12/16 18:11:23.866666 [13513]: Thawing priority 3
2014/12/16 18:11:23.866685 [13513]: Release freeze handler for prio 3
2014/12/16 18:11:23.873189 [recoverd:13666]: ctdb_control error: 
'managed to lock reclock file from inside daemon'
2014/12/16 18:11:23.873235 [recoverd:13666]: ctdb_control error: 
'managed to lock reclock file from inside daemon'
2014/12/16 18:11:23.873246 [recoverd:13666]: Async operation failed with 
ret=-1 res=-1 opcode=16
2014/12/16 18:11:23.873254 [recoverd:13666]: Async wait failed - 
fail_count=1
2014/12/16 18:11:23.873261 [recoverd:13666]: server/ctdb_recoverd.c:412 
Unable to set recovery mode. Recovery failed.
2014/12/16 18:11:23.873268 [recoverd:13666]: server/ctdb_recoverd.c:1996 
Unable to set recovery mode to normal on cluster

This appears to be happening over and over again.

ctdb status shows this:

Number of nodes:3 (including 1 deleted nodes)
pnn:1 192.168.1.10     OK (THIS NODE)
pnn:2 192.168.1.11     UNHEALTHY
Generation:1226492970
Size:2
hash:0 lmaster:1
hash:1 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:1

ip a l
Shows this:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
group default
     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
     inet 127.0.0.1/8 scope host lo
     inet6 ::1/128 scope host
        valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
state UP group default qlen 1000
     link/ether 08:00:27:d6:92:30 brd ff:ff:ff:ff:ff:ff
     inet 192.168.0.6/24 brd 192.168.0.255 scope global eth0
     inet 192.168.0.8/8 brd 192.255.255.255 scope global eth0
     inet 192.168.0.9/8 brd 192.255.255.255 scope global secondary eth0
     inet6 fe80::a00:27ff:fed6:9230/64 scope link
        valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
state UP group default qlen 1000
     link/ether 08:00:27:03:79:17 brd ff:ff:ff:ff:ff:ff
     inet 192.168.1.10/24 brd 192.168.1.255 scope global eth1
     inet6 fe80::a00:27ff:fe03:7917/64 scope link
        valid_lft forever preferred_lft forever

Rowland

>
> Stefan
>
>>> Rowland
>> Stefan
>>
>>
>> Am 16.12.14 um 10:30 schrieb Rowland Penny:
>>>>> On 16/12/14 07:53, Stefan Kania wrote:
>>>>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>>>>>
>>>>>> Hi Rowland,
>>>>>>
>>>>>> did you see that you have som Problems with IPs on node 1?
>>>>>> 2014/12/15 16:32:28.300370 [recoverd: 2497]: Failed to find
>>>>>> node to cover ip 192.168.0.9 2014/12/15 16:32:28.300412
>>>>>> [recoverd: 2497]: Failed to find node to cover ip
>>>>>> 192.168.0.8 I also had some problems with IP and
>>>>>> nameresolutions at the beginning. After I solved that
>>>>>> problem everything was fine.
>>>>>>
>>>>>> Stefan
>>>>>>
>>>>>>
>>>>>>
>>>>> I did wonder about those lines, I do not have 192.168.0.8 &
>>>>> 192.168.0.9, but Ronnie posted this:
>>>>>
>>>>> No, you should not/need not create them on the system. Ctdbd
>>>>> will create and assign these addresses automatically and
>>>>> dynamically while the cluster is running.
>>>>>
>>>>> So, do I need to create them and if so, where? This is one
>>>>> of those areas of CTDB that doesn't seem to documented at
>>>>> all.
>>>>>
>>>>> Rowland
>>>>>
>> -- Stefan Kania Landweg 13 25693 St. Michaelisdonn
>>
>>
>> Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie
>> ihre E-Mail. Weiter Informationen unter http://www.gnupg.org
>>
>> Mein Schlüssel liegt auf
>>
>> hkp://subkeys.pgp.net
>>
>>
> - -- 
> Stefan Kania
> Landweg 13
> 25693 St. Michaelisdonn
>
>
> Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie ihre
> E-Mail. Weiter Informationen unter http://www.gnupg.org
>
> Mein Schlüssel liegt auf
>
> hkp://subkeys.pgp.net
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
>
> iEUEARECAAYFAlSQbhcACgkQ2JOGcNAHDTYthwCWJWPKLQRHCKGPKfTIcD6M/NSy
> UACghab8M/tgslaBgc6Ynk0D0jshjJA=
> =66WA
> -----END PGP SIGNATURE-----