Setting up CTDB on OCFS2 and VMs ...

Wed Dec 31 08:46:30 MST 2014

On 17/12/14 09:08, Rowland Penny wrote:
> On 16/12/14 23:45, Martin Schwenke wrote:
>> On Tue, 16 Dec 2014 21:12:12 +0000, Rowland Penny
>> <repenny241155 at gmail.com> wrote:
>>
>>> I ran the ping_pong test this morning, following the wiki page and as
>>> far as I could see it passed all tests.
>> When I run "ping_pong /clusterfs/test.dat 3" on 1 node of a 2 node OCFS2
>> cluster, I see a very high locking rate - in the 10000s.  When I run it
>> on another node I see the same high locking rate and I don't see the
>> rate drop on the 1st node.  That's a fail.
>
> All I can say is that it did what the page said it would.
>
>>
>> This is on a cluster where I haven't worked out the extra steps to get
>> lock coherence.
>>
>>> I have come to the conclusion that you need to be a CTDB dev to set 
>>> CTDB
>>> up, only they seem to have ALL the information required.
>> Sorry, but that line is starting to grate.  I'm concerned that
>> statements like this are likely to put people off using CTDB. There are
>> many non-CTDB-devs out there running CTDB with other cluster
>> filesystems.
>
> Sorry if what I said upsets you, but I have put a lot of time into 
> trying to get this setup to work, but it seems to fail when I try to 
> add CTDB.
>
>> When the CTDB recovery lock is configured then CTDB has a hard
>> requirement that the cluster filesystem *must* provide lock coherence.
>> So the problem you have is a lack of lock coherence in OCFS2.
>
> But it passes the ping_pong test.
>
>> I am a CTDB dev.  I haven't yet got OCFS2 working, partly due to lack
>> of time to figure out which pieces I'm missing.  I have a simple recipe
>> that gets me to a similar point to where you are at and I haven't even
>> looked at corosync.  At some time I will try to go through Richard's
>> instructions and try to distill out the part that adds lock coherence.
>>
>> I was confused by the ping pong test results so I tried to clarify the
>> documentation for that test.
>>
>> It seems like OCFS2 is stupendously difficult to setup with lock
>> coherence.  This is not CTDB's fault.  Perhaps you need to be an OCFS2
>> dev to setup CTDB with OCFS2?  ;-)
>
> You could be right :-D
>>> I absolutely give up, I cannot make it work, god knows I have tried, 
>>> but
>>> I just cannot make it work with the information available. I can find
>>> bits here and bits there, but there still seems to be something 
>>> missing,
>>> or is it just me. Debian 7.7, Pacemaker, Corosync and Ocfs2 work OK, it
>>> is just when you try to add CTDB.
>> If all those other things provided lock coherence on the cluster
>> filesystem then CTDB would work.  So adding CTDB makes you notice the
>> problem but CTDB does not cause it.  :-)
>
> I can well believe what you are saying, so it might help if CTDB could 
> print something in the logs.
>
> Rowland
>
>>
>> peace & happiness,
>> martin
>

OK, I have been having another attempt at the ctdb cluster, I cannot get 
both nodes healthy if I use a lockfile in /etc/default/ctdb, so I have 
commented it out, both nodes are now showing OK. I then moved on to 
trying to get samba to join the domain, but it always fails with this 
error message:

Could not initialise message context. Try running as root
Failed to join domain: Access is denied

I have investigated ctdb on my system and have come to the conclusion 
that ctdb is a *MESS*, don't believe me ? then consider this:

root at cluster1:~# ls /var/ctdb
iptables-ctdb.flock  persistent  state
root at cluster1:~# ls /var/lib/ctdb
iptables-ctdb.flock  persistent  state
root at cluster1:~# ls /var/lib/lib/ctdb
brlock.tdb.1           iptables-ctdb.flock  persistent 
smbXsrv_open_global.tdb.1     smbXsrv_version_global.tdb.1
dbwrap_watchers.tdb.1  locking.tdb.1        printer_list.tdb.1 
smbXsrv_session_global.tdb.1  state
g_lock.tdb.1           notify_index.tdb.1   serverid.tdb.1 
smbXsrv_tcon_global.tdb.1
root at cluster1:~# ls /var/ctdb/persistent/
root at cluster1:~# ls /var/ctdb/state/
failcount  interface_modify_eth0.flock    service_state
root at cluster1:~# ls /var/lib/ctdb/persistent/
root at cluster1:~# ls /var/lib/ctdb/state/
failcount  interface_modify_eth0.flock    service_state
root at cluster1:~# ls /var/lib/lib/ctdb/persistent/
account_policy.tdb.1  ctdb.tdb.0  ctdb.tdb.1  group_mapping.tdb.1 
passdb.tdb.1  registry.tdb.1  secrets.tdb.1    share_info.tdb.1
root at cluster1:~# ls /var/lib/lib/ctdb/state
failcount  interface_modify_eth0.flock    persistent_health.tdb.1 
recdb.tdb.1  service_state

Why have very similar data in 3 places ? why have the conf (which 
incidentaly isn't called a conf file) in a different place from the 
other ctdb files in /etc ?

More to the point, Why, oh why doesn't it work.

Rowland