Setting up CTDB on OCFS2 and VMs ...
Rowland Penny
repenny241155 at gmail.com
Tue Dec 16 15:08:08 MST 2014
On 16/12/14 21:19, Ralph Böhme wrote:
> On Tue, Dec 16, 2014 at 09:12:12PM +0000, Rowland Penny wrote:
>> On 16/12/14 20:59, Martin Schwenke wrote:
>>> On Tue, 16 Dec 2014 18:22:02 +0000, Rowland Penny
>>> <repenny241155 at gmail.com> wrote:
>>>
>>>> I dont think so, tailing the log shows this:
>>>>
>>>> root at cluster1:~# tail /var/log/ctdb/log.ctdb
>>>> 2014/12/16 18:11:23.866612 [13513]: Thawing priority 2
>>>> 2014/12/16 18:11:23.866634 [13513]: Release freeze handler for prio 2
>>>> 2014/12/16 18:11:23.866666 [13513]: Thawing priority 3
>>>> 2014/12/16 18:11:23.866685 [13513]: Release freeze handler for prio 3
>>>> 2014/12/16 18:11:23.873189 [recoverd:13666]: ctdb_control error:
>>>> 'managed to lock reclock file from inside daemon'
>>>> 2014/12/16 18:11:23.873235 [recoverd:13666]: ctdb_control error:
>>>> 'managed to lock reclock file from inside daemon'
>>>> 2014/12/16 18:11:23.873246 [recoverd:13666]: Async operation failed with
>>>> ret=-1 res=-1 opcode=16
>>>> 2014/12/16 18:11:23.873254 [recoverd:13666]: Async wait failed -
>>>> fail_count=1
>>>> 2014/12/16 18:11:23.873261 [recoverd:13666]: server/ctdb_recoverd.c:412
>>>> Unable to set recovery mode. Recovery failed.
>>>> 2014/12/16 18:11:23.873268 [recoverd:13666]: server/ctdb_recoverd.c:1996
>>>> Unable to set recovery mode to normal on cluster
>>>>
>>>> This appears to be happening over and over again.
>>> That is the indicator that you have a lock coherency problem. Please
>>> see the stuff I made bold in:
>>>
>>> https://wiki.samba.org/index.php/Ping_pong
>>>
>>> Yes, this is hard and it tripped me up when I rushed through the
>>> ping-pong test... and there was nothing in bold there to draw my
>>> attention to that detail. As Michael Adam has mentioned, some cluster
>>> filesystems will look like they fail this test when they actually pass,
>>> so it is difficult to have a test that works everywhere...
>>>
>>> I'll try to update that message to make this clearer and send users
>>> back to the ping-pong test.
>>>
>>> peace & happiness,
>>> martin
>> I ran the ping_pong test this morning, following the wiki page and
>> as far as I could see it passed all tests.
>>
>> I have come to the conclusion that you need to be a CTDB dev to set
>> CTDB up, only they seem to have ALL the information required.
>>
>> I absolutely give up, I cannot make it work, god knows I have tried,
>> but I just cannot make it work with the information available. I can
>> find bits here and bits there, but there still seems to be something
>> missing, or is it just me. Debian 7.7, Pacemaker, Corosync and Ocfs2
>> work OK, it is just when you try to add CTDB.
> can you share the bits from Debain to ocfs2? I'll set this up the next
> day and see if I get ctdb to behave.
>
> Cheerio!
> -Ralph
>
OK, I based this on what Richard posted:
1. Create two VirtualBox VMs with enough memory and disk for your Linux
Distro. I used Debian 7.7 with 512MB and 8GB. You will also need an
extra interface on each VM for the clustering private network. I set
them to an internal type.
2. Because you will need a shared disk, create one:
vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size 10240
--variant Fixed --format VDI # Creates a 10GB fixed sized disk
vboxmanage modifyhd ~/VirtualBox\ VMs/SharedHD1.vdi --type shareable
Also, use the GUI to add the shared disk to both VMs.
3. Install the OS on each of the VM's.
4. Install clustering packages:
apt-get install openais corosync pacemaker ocfs2-tools-pacemaker dlm-pcmk
5. Configure corosync
nano /etc/corosync/corosync.conf
Make sure that bindnetaddr is defined and points to your private
interface. I set it to 192.168.1.0
# Copy the file to the other node.
scp /etc/corosync/corosync.conf root at 192.168.0.7:/etc/corosync/
# ON BOTH NODES
service pacemaker stop # Stop these in case they were running
service corosync stop # Same here
nano /etc/default/corosync
Change:
START=no
To:
START=yes
# now start the cluster
service corosync start
# Also start it on the other node(s).
# Now check the status:
root at cluster1:~# crm_mon -1
============
Last updated: Mon Dec 15 10:46:20 2014
Last change: Mon Dec 15 10:44:18 2014 via crmd on cluster1
Stack: openais
Current DC: cluster1 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
0 Resources configured.
============
Online: [ cluster1 cluster2 ]
If you do not see all the other nodes online, then you have to debug the
problem.
6. Configure the Oracle cluster
dpkg-reconfigure ocfs2-tools
Configuring ocfs2-tools
# Would you like to start an OCFS2 cluster (O2CB) at boot time?
#
# <Yes>
#
# Name of the cluster to start at boot time:
#
# ctdbdemo
# Create the ocfs2 cluster conf file
o2cb_ctl -C -n ctdbdemo -t cluster
o2cb_ctl -C -n cluster1 -t node -a number=1 -a ip_address=192.168.1.10
-a ip_port=7777 -a cluster=ctdbdemo
o2cb_ctl -C -n cluster2 -t node -a number=2 -a ip_address=192.168.1.11
-a ip_port=7777 -a cluster=ctdbdemo
# ON BOTH NODES
service corosync stop
# Copy files to the other node.
scp /etc/default/o2cb root at 192.168.0.7:/etc/default/
scp /etc/ocfs2/cluster.conf root at 192.168.0.7:/etc/ocfs2/
service o2cb start
service corosync start
crm configure property stonith-enabled=false
7. Create the shared shared file system on one node:
mkfs.ocfs2 -L CTDBdemocommon -T datafiles -N 4 /dev/sdb
8. Mount it on both and ensure that you can create files/dirs on one
node and see them on the other node.
mkdir /cluster
mount -t ocfs2 /dev/sdb /cluster
This gets you a shared cluster??
For me it all goes pear shaped when I try to add CTDB.
If you find that I have missed something or done something wrong, then I
will not be surprised, info is very hard to find.
Rowland
More information about the samba-technical
mailing list