Setting up CTDB on OCFS2 and VMs ...

Rowland Penny repenny241155 at gmail.com
Tue Dec 16 13:02:54 MST 2014


On 16/12/14 19:05, ronnie sahlberg wrote:
> On Tue, Dec 16, 2014 at 10:22 AM, Rowland Penny <repenny241155 at gmail.com> wrote:
>> On 16/12/14 17:38, Stefan Kania wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Hi Rowland,
>>>
>>>
>>> Am 16.12.14 um 15:27 schrieb Rowland Penny:
>>>> On 16/12/14 13:12, Stefan Kania wrote: Hi Rowland,
>>>>
>>>> If these addresses should be your IPs vor Clients accessing the
>>>> Cluster, you must put tese IPs in your public_addresses file
>>>>
>>>>> OK, they are in the public_addresses file
>>>> and in your DNS the hostname vor your cluster should point to both
>>>> addresses.
>>>>
>>>>> DOH! CNAME.
>>>> Remeber that you have ti install "ethtool" on all nodes!
>>>>
>>>>> What do mean remember? I haven't seen it anywhere that you must
>>>>> install ethtool, It is installed anyway :-D
>>> I had an error-massage on the node without ethtool, and the nodes were
>>> unhealty. After I installaed ethtool it worked for me and the
>>> error-message was gone.
>>>> So if you start the cluster the system will pick an IP-Address out
>>>> of the file. If there is no public_addresses file your system will
>>>> not get any IP. If there is no ethtool but a public_address file,
>>>> the node cant set on of the IPs. If one node of your clusters fails
>>>> the second not will get the IP address from the failed host. BUT
>>>> REMEBER you wount see the IPs with "ifconfig" you MUST use "ip a
>>>> l".
>>>>
>>>>> That is something else that I haven't seen anywhere! :-)
>>> Read this:
>>>
>>> http://unix.stackexchange.com/questions/93412/difference-between-ifconfig-and-ip-commands
>>>
>>> What I think ist that CTDB is assaigning the virtual IP over "ip" and
>>> configuring the NIC with ethtool. So if you are setting an IP with the
>>> "ip" command, this address is not shown with "ifconfig"
>>>
>>> Did you get rid of the IP-Errormessage?
>>
>> It would seem so, the last time it appeared in log.ctdb was here:
>>
>> 2014/12/16 14:48:41.784044 [recoverd:13666]: Takeover run starting
>> 2014/12/16 14:48:41.784284 [recoverd:13666]: Failed to find node to cover ip
>> 192.168.0.9
>> 2014/12/16 14:48:41.784305 [recoverd:13666]: Failed to find node to cover ip
>> 192.168.0.8
>> 2014/12/16 14:48:41.850344 [recoverd:13666]: Takeover run completed
>> successfully
>>
> If this only happens during startup I would not worry about it.
> It may be that none of the nodes are ready to accept IP addresses yet
> and this is then just a benign but annoying message.
>
>
>> A short while later there is this:
>>
>> 2014/12/16 14:52:34.242911 [recoverd:13666]: Takeover run starting
>> 2014/12/16 14:52:34.243356 [13513]: Takeover of IP 192.168.0.9/8 on
>> interface eth0
>> 2014/12/16 14:52:34.261916 [13513]: Takeover of IP 192.168.0.8/8 on
>> interface eth0
> This looks wrong.
> I suspect you want this to be using /24 bit netmasks, not 8 bit masks.
> See also below in the 'ip addr show' output where you see the mask
> beeing /24 for the static address.
>
> I.e.  you should probably change your public addresses file and set
> the netmask to 24

So I changed the netmask to 24 and now neither of the nodes come up, 
change back to 8 and node 1 becomes OK, back to to 24 and both UNHEALTHY 
again.

Now Steve said that this wasn't rocket science, he was wrong, it is!!

I cannot spend any more time on this, it might work on redhat but I 
cannot make it work on Debian. I must be missing something, but for the 
life of me, I cannot find what it is.

Rowland

>
>
>> 2014/12/16 14:52:34.490010 [recoverd:13666]: Takeover run completed
>> successfully
>>
>> The ipaddresses never appear again.
>>
>>> I think that's your main problem.
>>
>> I dont think so, tailing the log shows this:
>>
>> root at cluster1:~# tail /var/log/ctdb/log.ctdb
>> 2014/12/16 18:11:23.866612 [13513]: Thawing priority 2
>> 2014/12/16 18:11:23.866634 [13513]: Release freeze handler for prio 2
>> 2014/12/16 18:11:23.866666 [13513]: Thawing priority 3
>> 2014/12/16 18:11:23.866685 [13513]: Release freeze handler for prio 3
>> 2014/12/16 18:11:23.873189 [recoverd:13666]: ctdb_control error: 'managed to
>> lock reclock file from inside daemon'
>> 2014/12/16 18:11:23.873235 [recoverd:13666]: ctdb_control error: 'managed to
>> lock reclock file from inside daemon'
>> 2014/12/16 18:11:23.873246 [recoverd:13666]: Async operation failed with
>> ret=-1 res=-1 opcode=16
>> 2014/12/16 18:11:23.873254 [recoverd:13666]: Async wait failed -
>> fail_count=1
>> 2014/12/16 18:11:23.873261 [recoverd:13666]: server/ctdb_recoverd.c:412
>> Unable to set recovery mode. Recovery failed.
>> 2014/12/16 18:11:23.873268 [recoverd:13666]: server/ctdb_recoverd.c:1996
>> Unable to set recovery mode to normal on cluster
>>
>> This appears to be happening over and over again.
>>
>> ctdb status shows this:
>>
>> Number of nodes:3 (including 1 deleted nodes)
>> pnn:1 192.168.1.10     OK (THIS NODE)
>> pnn:2 192.168.1.11     UNHEALTHY
> You can run 'ctdb scriptstatus'   on node 1  and it should give you
> more detail about why the node is unhealthy.
>
>
>
>> Generation:1226492970
>> Size:2
>> hash:0 lmaster:1
>> hash:1 lmaster:2
>> Recovery mode:NORMAL (0)
>> Recovery master:1
>>
>> ip a l
>> Shows this:
>>
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN group
>> default
>>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>      inet 127.0.0.1/8 scope host lo
>>      inet6 ::1/128 scope host
>>         valid_lft forever preferred_lft forever
>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
>> UP group default qlen 1000
>>      link/ether 08:00:27:d6:92:30 brd ff:ff:ff:ff:ff:ff
>>      inet 192.168.0.6/24 brd 192.168.0.255 scope global eth0
>>      inet 192.168.0.8/8 brd 192.255.255.255 scope global eth0
>>      inet 192.168.0.9/8 brd 192.255.255.255 scope global secondary eth0
>>      inet6 fe80::a00:27ff:fed6:9230/64 scope link
>>         valid_lft forever preferred_lft forever
>> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state
>> UP group default qlen 1000
>>      link/ether 08:00:27:03:79:17 brd ff:ff:ff:ff:ff:ff
>>      inet 192.168.1.10/24 brd 192.168.1.255 scope global eth1
>>      inet6 fe80::a00:27ff:fe03:7917/64 scope link
>>         valid_lft forever preferred_lft forever
>>
>> Rowland
>>
>>
>>> Stefan
>>>
>>>>> Rowland
>>>> Stefan
>>>>
>>>>
>>>> Am 16.12.14 um 10:30 schrieb Rowland Penny:
>>>>>>> On 16/12/14 07:53, Stefan Kania wrote:
>>>>>>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>>>>>>>>
>>>>>>>> Hi Rowland,
>>>>>>>>
>>>>>>>> did you see that you have som Problems with IPs on node 1?
>>>>>>>> 2014/12/15 16:32:28.300370 [recoverd: 2497]: Failed to find
>>>>>>>> node to cover ip 192.168.0.9 2014/12/15 16:32:28.300412
>>>>>>>> [recoverd: 2497]: Failed to find node to cover ip
>>>>>>>> 192.168.0.8 I also had some problems with IP and
>>>>>>>> nameresolutions at the beginning. After I solved that
>>>>>>>> problem everything was fine.
>>>>>>>>
>>>>>>>> Stefan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> I did wonder about those lines, I do not have 192.168.0.8 &
>>>>>>> 192.168.0.9, but Ronnie posted this:
>>>>>>>
>>>>>>> No, you should not/need not create them on the system. Ctdbd
>>>>>>> will create and assign these addresses automatically and
>>>>>>> dynamically while the cluster is running.
>>>>>>>
>>>>>>> So, do I need to create them and if so, where? This is one
>>>>>>> of those areas of CTDB that doesn't seem to documented at
>>>>>>> all.
>>>>>>>
>>>>>>> Rowland
>>>>>>>
>>>> -- Stefan Kania Landweg 13 25693 St. Michaelisdonn
>>>>
>>>>
>>>> Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie
>>>> ihre E-Mail. Weiter Informationen unter http://www.gnupg.org
>>>>
>>>> Mein Schlüssel liegt auf
>>>>
>>>> hkp://subkeys.pgp.net
>>>>
>>>>
>>> - -- Stefan Kania
>>> Landweg 13
>>> 25693 St. Michaelisdonn
>>>
>>>
>>> Signieren jeder E-Mail hilft Spam zu reduzieren. Signieren Sie ihre
>>> E-Mail. Weiter Informationen unter http://www.gnupg.org
>>>
>>> Mein Schlüssel liegt auf
>>>
>>> hkp://subkeys.pgp.net
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
>>>
>>> iEUEARECAAYFAlSQbhcACgkQ2JOGcNAHDTYthwCWJWPKLQRHCKGPKfTIcD6M/NSy
>>> UACghab8M/tgslaBgc6Ynk0D0jshjJA=
>>> =66WA
>>> -----END PGP SIGNATURE-----
>>



More information about the samba-technical mailing list