Setting up CTDB on OCFS2 and VMs ...

Sun Dec 28 21:19:42 MST 2014

Am 29.12.2014 um 05:10 schrieb Günter Kukkukk:
> Am 07.12.2014 um 14:27 schrieb Richard Sharpe:
>> On Sat, Dec 6, 2014 at 4:21 PM, Michael Adam <obnox at samba.org> wrote:
>>> On 2014-12-07 at 00:48 +0100, Michael Adam wrote:
>>>>
>>>> So the important bit is that in your case ctdb
>>>> is running unprotected from split brain.
>>>> The only reference to split brain is a notification
>>>> of user steve in case drbd detects a split brain.
>>>> If I get it right (there are no details about this
>>>> in the blog post), this means that until user steve
>>>> reacts to that notification the ctdb/samba cluster
>>>> runs happily in the split brain situation and
>>>> corrupts the users' data.
>>>
>>> Ok, maybe it is not quite as bad. The config snippet
>>>
>>> net {
>>>   allow-two-primaries;
>>>   after-sb-0pri discard-zero-changes;
>>>   after-sb-1pri discard-secondary;
>>>   after-sb-2pri disconnect;
>>> }
>>>
>>> Which is explained to some extent in
>>>
>>> http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html
>>>
>>> seems to indicate that in case of split brain
>>> certain measures are potentially taken.
>>>
>>> Also read the explanations about DRBD split brain here:
>>>
>>> http://www.drbd.org/users-guide/s-split-brain-notification-and-recovery.html
>>>
>>> This states that DRDB split brain is different from
>>> cluster split brain (also called cluster partition).
>>>
>>> So I'd really like to know what happens in your
>>> setup in a split brain situation.
>>
>> Well, it turns out that drbd has this thing called dual-master mode,
>> which turns it into shared storage for two nodes only.
>>
>> So, as long as the OCFS2 DLM is also running, there should not be any
>> split-brain events.
> 
> Hi Richard,
> 
> i also spent some time with OCFS2 - but was *not* able to get the
> CTDB_RECOVERY_LOCK working properly! Which is a No-go for production.
> 
> When starting the 2nd node the log always showed:
>   ERROR: recovery lock file /mnt/ctdb_lock/ctdb_lockfile not locked when recovering!
> and both nodes started wild logging... :-(
> 
> I then had a very close look at the ping_pong source and i think it can
> reliably be used to test the fcntl() locking features.
> 
> With OCFS2 i was also *not* able to get sane results with the ping_pong test.
> 
> For a 2 node cluster, even simple
>    ping_pong shared_file 3
> when running on both nodes should result in a significant drop in locks/second.
> This was not the case here.
> 
> When using
>    ping_pong -rw shared_file 3
> some stuff *seems* to be working right - but not reliably.
> When starting the 2nd node, it *could* happen that
>   data increment = 2
> is shown right. But when you stop ping_pong on that node and start it again,
> it shows some random nature. Btw - the locks/second always dropped a lot, but
> that was the only reliable result.
> 
> I'm not sure whether corosync, pacemaker and friends can really help here.
> That would be some heavy weight overkill...
> 
> We'll see. I install GFS2 now ...
> 
> Cheers, Günter

sorry, i forgot to add the following 2 urls:

https://docs.oracle.com/cd/E37670_01/E37355/html/ol_tshoot_ocfs2.html

Sorry, in german - just the supported locks with ocfs2 and gfs2:
"Locking sorgt für klare Verhältnisse":
http://www.linux-magazin.de/Online-Artikel/GFS2-und-OCFS2-zwei-Cluster-Dateisysteme-im-Linux-Kernel

Cheers, Günter

> 
>>
>> Making sure that the DLM was running was why I put so much effort into
>> getting the ocfs2-tools code running.
>>
>> The disadvantage of using DRBD is that you cannot run more than a
>> 2-node cluster.
>>
> 
> 


--