Setting up CTDB on OCFS2 and VMs ...

Tue Dec 16 15:08:08 MST 2014

On 16/12/14 21:19, Ralph Böhme wrote:
> On Tue, Dec 16, 2014 at 09:12:12PM +0000, Rowland Penny wrote:
>> On 16/12/14 20:59, Martin Schwenke wrote:
>>> On Tue, 16 Dec 2014 18:22:02 +0000, Rowland Penny
>>> <repenny241155 at gmail.com> wrote:
>>>
>>>> I dont think so, tailing the log shows this:
>>>>
>>>> root at cluster1:~# tail /var/log/ctdb/log.ctdb
>>>> 2014/12/16 18:11:23.866612 [13513]: Thawing priority 2
>>>> 2014/12/16 18:11:23.866634 [13513]: Release freeze handler for prio 2
>>>> 2014/12/16 18:11:23.866666 [13513]: Thawing priority 3
>>>> 2014/12/16 18:11:23.866685 [13513]: Release freeze handler for prio 3
>>>> 2014/12/16 18:11:23.873189 [recoverd:13666]: ctdb_control error:
>>>> 'managed to lock reclock file from inside daemon'
>>>> 2014/12/16 18:11:23.873235 [recoverd:13666]: ctdb_control error:
>>>> 'managed to lock reclock file from inside daemon'
>>>> 2014/12/16 18:11:23.873246 [recoverd:13666]: Async operation failed with
>>>> ret=-1 res=-1 opcode=16
>>>> 2014/12/16 18:11:23.873254 [recoverd:13666]: Async wait failed -
>>>> fail_count=1
>>>> 2014/12/16 18:11:23.873261 [recoverd:13666]: server/ctdb_recoverd.c:412
>>>> Unable to set recovery mode. Recovery failed.
>>>> 2014/12/16 18:11:23.873268 [recoverd:13666]: server/ctdb_recoverd.c:1996
>>>> Unable to set recovery mode to normal on cluster
>>>>
>>>> This appears to be happening over and over again.
>>> That is the indicator that you have a lock coherency problem.  Please
>>> see the stuff I made bold in:
>>>
>>>    https://wiki.samba.org/index.php/Ping_pong
>>>
>>> Yes, this is hard and it tripped me up when I rushed through the
>>> ping-pong test...  and there was nothing in bold there to draw my
>>> attention to that detail. As Michael Adam has mentioned, some cluster
>>> filesystems will look like they fail this test when they actually pass,
>>> so it is difficult to have a test that works everywhere...
>>>
>>> I'll try to update that message to make this clearer and send users
>>> back to the ping-pong test.
>>>
>>> peace & happiness,
>>> martin
>> I ran the ping_pong test this morning, following the wiki page and
>> as far as I could see it passed all tests.
>>
>> I have come to the conclusion that you need to be a CTDB dev to set
>> CTDB up, only they seem to have ALL the information required.
>>
>> I absolutely give up, I cannot make it work, god knows I have tried,
>> but I just cannot make it work with the information available. I can
>> find bits here and bits there, but there still seems to be something
>> missing, or is it just me. Debian 7.7, Pacemaker, Corosync and Ocfs2
>> work OK, it is just when you try to add CTDB.
> can you share the bits from Debain to ocfs2? I'll set this up the next
> day and see if I get ctdb to behave.
>
> Cheerio!
> -Ralph
>
OK, I based this on what Richard posted:

1. Create two VirtualBox VMs with enough memory and disk for your Linux 
Distro. I used Debian 7.7 with 512MB and 8GB. You will also need an 
extra interface on each VM for the clustering private network. I set 
them to an internal type.

2. Because you will need a shared disk, create one:

vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size 10240 
--variant Fixed --format VDI # Creates a 10GB fixed sized disk

vboxmanage modifyhd ~/VirtualBox\ VMs/SharedHD1.vdi --type shareable

Also, use the GUI to add the shared disk to both VMs.

3. Install the OS on each of the VM's.

4. Install clustering packages:

apt-get install openais corosync pacemaker ocfs2-tools-pacemaker dlm-pcmk

5. Configure corosync

nano /etc/corosync/corosync.conf

Make sure that bindnetaddr is defined and points to your private 
interface. I set it to 192.168.1.0

# Copy the file to the other node.
scp /etc/corosync/corosync.conf root at 192.168.0.7:/etc/corosync/

# ON BOTH NODES
service pacemaker stop  # Stop these in case they were running
service corosync stop # Same here

nano /etc/default/corosync
Change:
START=no

To:

START=yes

# now start the cluster
service corosync start

# Also start it on the other node(s).

# Now check the status:
root at cluster1:~# crm_mon -1
============
Last updated: Mon Dec 15 10:46:20 2014
Last change: Mon Dec 15 10:44:18 2014 via crmd on cluster1
Stack: openais
Current DC: cluster1 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ cluster1 cluster2 ]

If you do not see all the other nodes online, then you have to debug the 
problem.

6. Configure the Oracle cluster

dpkg-reconfigure ocfs2-tools
Configuring ocfs2-tools
# Would you like to start an OCFS2 cluster (O2CB) at boot time?
#
# <Yes>
#
# Name of the cluster to start at boot time:
#
# ctdbdemo

# Create the ocfs2 cluster conf file
o2cb_ctl -C -n ctdbdemo -t cluster
o2cb_ctl -C -n cluster1 -t node -a number=1 -a ip_address=192.168.1.10 
-a ip_port=7777 -a cluster=ctdbdemo
o2cb_ctl -C -n cluster2 -t node -a number=2 -a ip_address=192.168.1.11 
-a ip_port=7777 -a cluster=ctdbdemo

# ON BOTH NODES
service corosync stop

# Copy files to the other node.
scp /etc/default/o2cb  root at 192.168.0.7:/etc/default/
scp /etc/ocfs2/cluster.conf  root at 192.168.0.7:/etc/ocfs2/

service o2cb start
service corosync start

crm configure property stonith-enabled=false

7. Create the shared shared file system on one node:

mkfs.ocfs2 -L CTDBdemocommon -T datafiles -N 4 /dev/sdb

8. Mount it on both and ensure that you can create files/dirs on one 
node and see them on the other node.

mkdir /cluster
mount -t ocfs2 /dev/sdb /cluster

This gets you a shared cluster??

For me it all goes pear shaped when I try to add CTDB.

If you find that I have missed something or done something wrong, then I 
will not be surprised, info is very hard to find.

Rowland