Setting up CTDB on OCFS2 and VMs ...

Sun Dec 28 21:10:51 MST 2014

Am 07.12.2014 um 14:27 schrieb Richard Sharpe:
> On Sat, Dec 6, 2014 at 4:21 PM, Michael Adam <obnox at samba.org> wrote:
>> On 2014-12-07 at 00:48 +0100, Michael Adam wrote:
>>>
>>> So the important bit is that in your case ctdb
>>> is running unprotected from split brain.
>>> The only reference to split brain is a notification
>>> of user steve in case drbd detects a split brain.
>>> If I get it right (there are no details about this
>>> in the blog post), this means that until user steve
>>> reacts to that notification the ctdb/samba cluster
>>> runs happily in the split brain situation and
>>> corrupts the users' data.
>>
>> Ok, maybe it is not quite as bad. The config snippet
>>
>> net {
>>   allow-two-primaries;
>>   after-sb-0pri discard-zero-changes;
>>   after-sb-1pri discard-secondary;
>>   after-sb-2pri disconnect;
>> }
>>
>> Which is explained to some extent in
>>
>> http://www.drbd.org/users-guide/s-configure-split-brain-behavior.html
>>
>> seems to indicate that in case of split brain
>> certain measures are potentially taken.
>>
>> Also read the explanations about DRBD split brain here:
>>
>> http://www.drbd.org/users-guide/s-split-brain-notification-and-recovery.html
>>
>> This states that DRDB split brain is different from
>> cluster split brain (also called cluster partition).
>>
>> So I'd really like to know what happens in your
>> setup in a split brain situation.
> 
> Well, it turns out that drbd has this thing called dual-master mode,
> which turns it into shared storage for two nodes only.
> 
> So, as long as the OCFS2 DLM is also running, there should not be any
> split-brain events.

Hi Richard,

i also spent some time with OCFS2 - but was *not* able to get the
CTDB_RECOVERY_LOCK working properly! Which is a No-go for production.

When starting the 2nd node the log always showed:
  ERROR: recovery lock file /mnt/ctdb_lock/ctdb_lockfile not locked when recovering!
and both nodes started wild logging... :-(

I then had a very close look at the ping_pong source and i think it can
reliably be used to test the fcntl() locking features.

With OCFS2 i was also *not* able to get sane results with the ping_pong test.

For a 2 node cluster, even simple
   ping_pong shared_file 3
when running on both nodes should result in a significant drop in locks/second.
This was not the case here.

When using
   ping_pong -rw shared_file 3
some stuff *seems* to be working right - but not reliably.
When starting the 2nd node, it *could* happen that
  data increment = 2
is shown right. But when you stop ping_pong on that node and start it again,
it shows some random nature. Btw - the locks/second always dropped a lot, but
that was the only reliable result.

I'm not sure whether corosync, pacemaker and friends can really help here.
That would be some heavy weight overkill...

We'll see. I install GFS2 now ...

Cheers, Günter

> 
> Making sure that the DLM was running was why I put so much effort into
> getting the ocfs2-tools code running.
> 
> The disadvantage of using DRBD is that you cannot run more than a
> 2-node cluster.
> 

--