Setting up CTDB on OCFS2 and VMs ...

Sat Jan 3 02:37:11 MST 2015

On 03/01/15 00:36, Michael Adam wrote:
> On 2015-01-02 at 22:32 +0000, Rowland Penny wrote:
>> On 02/01/15 22:07, Martin Schwenke wrote:
>>> On Fri, 02 Jan 2015 12:57:14 +0000, Rowland Penny
>>> <repenny241155 at gmail.com> wrote:
>>>
>>>> OK, the lockfile now seems to work, at least I have a setting in
>>>> /etc/default/ctdb and both nodes are OK.
>>>>
>>>> How have I managed this: well after reading something on a google link,
>>>> I did something, changed where 'CTDB_RECOVERY_LOCK' pointed to and it
>>>> now works.
>>>>
>>>> What did I do???
>>>>
>>>> I INSTALLED AND SETUP AN NFS SERVER ON ONE OF THE NODES!!!
>>>>
>>>> Great, to use ctdb and samba (which is a way to share files), you have
>>>> to set up a separate way of sharing files.
>>> Please stop.  You're embarrassing yourself and you're spreading
>>> misinformation that people will find when they search for information
>>> about running CTDB with OCFS2.
>>>
>>> What you say is simply not true, apart from that fact that you need
>>> lock coherency between the nodes to be able to use the recovery lock.
>>> You have simply hacked a workaround that can apparently make the
>>> recovery lock work.  To make the recovery lock work properly your
>>> cluster filesystem needs lock coherency.  We have already discussed this
>>> several weeks ago:
>>>
>>>    https://lists.samba.org/archive/samba-technical/2014-December/104426.html
>>>
>>> Nothing has changed.
>> Yes, nothing has changed, the ping_pong test works just like the wiki page
>> says it should,
> So what does really happen? The following?
> - ping_pong seems to work, i.e. you  run (e.g.)
>    "ping_pong file 3" on two nodes and you see
>    lock rates printed by both processes?
> - "ping_ping -rw" seem sto work in that it
>    prints correct data increment values

Hi Michael, It has been some time since I last ran the ping_pong test, 
but when I did run it, I followed the wiki page to the letter and got 
the results that the page said I should.

I have now run the test again, but in the way you suggested and you are 
correct, it isn't working, both nodes are printing lock rates.

I also see that Gunter Kukkukk has found the same webpage that I found 
that refers to NFS. Lets see how he goes on, or should he give up as 
well Martin?

Rowland

>> so if you are saying that the test is unreliable, then so be
>> it.
> No, that is negative. If there is a problem with the test
> then we did not see it before, and now we are narrowing it
> down...
>
> As Martin has indicated, a possible explanation of what is
> happening is this:
>
> Your file system setup does not provide correct fcntl byte range
> lock semantics across nodes in a way that the lock calls seem
> to succeed on each node, but no real contention happens between
> nodes, i.e. the lock calls only have a local effect.
> And while there is no other accessor, the -rw test also seems to
> work nicely.
>
> Martin has proposed the visible dropping of the lock rate
> in the (non-rw test) when adding a second process as an
> indication of success, but as I already posted some other day
> that is not a reliable measure.
>
> I have an idea of how I could improve the test to better
> detect this kind of lack of support, but after thinking
> about it for a bit, here is the test run that you can
> do right now without the need of new tools:
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Test:
>
> - run "ping_pong /clustered/file 2"
>    on one node
> - run the same command on a second node
>
> Note the number "2": This is the exact number of
> processes we intend to run on the file, and not
> as in the other test, a larger number. This is important!
>
> Result:
>
> - If both commands happily print lock rates,
>    then your file system does NOT support the necessary
>    cross-node fcntl byte range lock semantics.
>
> - If the file system supports fcntl byte range locks
>    cross node, then one process will print
>    "lock at 0 failed! - Resource deadlock avoided"
>    and the other will print
>    "lock at 1 failed! - Resource deadlock avoided"
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> That's it.
> Try it on a local file system, two processes on the
> same file. Or two processes on the same file in the
> cluster FS but running on the same node.
> You will see the deadlock avoided messages.
>
> But my guess is that when running two processes on
> two different nodes, then you'll see both printing
> positive lock rates. If this is the case we have
> proof that your FS setup is still not fit for ctdb.
>
> (And I think, that Ralph has also already indicated
> why: you have not configured/setup DLM.)
>
> I think I should update the wiki page and possibly
> manpage with these hints, but let's see what they
> gain you first.
>
>>> It appears that getting lock coherency to work in OCFS2 is staggeringly
>>> difficult.  Unfortunately, apart from Richard's work, we have no recipe
>>> for setting up OCFS2 with lock coherence.  We can't tell you what
>>> is wrong with your cluster except that CTDB's lock coherence test for
>>> the recovery lock is failing.  Perhaps this is a topic that should be
>>> taken to an OCFS2 mailing list?
>> Why? as far as I can see (and from the information I can find) everything
>> works until I try to get ctdb to set the lockfile.
> Right, and this simply means that your ocfs2 configuration is not
> correct yet. And as remarked above already, I think that Ralph has
> also pointed out the fact that you have not configured or set up
> the dlm?
>
> You have to understand that we on this list are generally not
> developers of the clustered file systems. We develop the ctdb
> software that just requires and uses a tiny but important feature
> of the file system. Most of us have of course worked with one
> file system or another but that may not be ocfs2, and we may not
> even have any concrete personal experience with ocfs2 (like me).
> So we try to help you the best we can, but we can not necessarily
> tell you how to fix the file system (setup). We can tell you where to
> look, generally speaking, and maybe someone more knowledgeable of
> OCFS2 can chime in.
>
>> I then find a post that said set the lock on an NFS shared
>> directory, I do this and the two nodes are now both OK and you
>> are telling me that what I am doing is wrong and blaming it on
>> OCFS2, but will not or cannot tell me what is wrong.
> NFS in this respect is just another distributed storage
> that happens to offer posix fcntl byte range locks
> cross node at least to some extent. But if you do it this
> way, then you are completely ignoring the problems that you
> have with your ocfs setup. I.e. you install an unneccessary
> workaround in order to get ctdb up healthy, but will then
> still have the problems with your OCFS2 setup and this can
> hurt you when you serve files off OCFS2 with samba.
>
> I.e.: Installing NFS for the reclock does not fix your OCFS2 setup!
>
> In this respect it is the wrong fix, even if it does get you a
> healthy ctdb. And nobody is blaming it on OCFS2. But rather on
> your setup, which must still be flawed. One could possibly blame
> OCFS2 of being hard to setup right, but I'm not in the position
> to do so. Others (like Richard) have reported succes so it
> must be possible.
>
>>> How about we leave it at that and stop beating up on CTDB because a
>>> particular filesystem doesn't (easily) provide a prerequisite feature?
>> I will say it again, everything seems to work ok until you set
>> 'CTDB_RECOVERY_LOCK' to be on [...] the cluster,
> Right and I repeat that this means that your setup of the cluster
> FS is still not correct.
>
>> but if you set it on a NFS share it seems to work.
> But this does not fix the setup of your cluster FS.
>
> Now back to the top:
> Could you run the "ping_pong ... 2" test?
>
> Cheers - Michael