Setting up CTDB on OCFS2 and VMs ...

Fri Jan 2 17:36:10 MST 2015

On 2015-01-02 at 22:32 +0000, Rowland Penny wrote:
> On 02/01/15 22:07, Martin Schwenke wrote:
> >On Fri, 02 Jan 2015 12:57:14 +0000, Rowland Penny
> ><repenny241155 at gmail.com> wrote:
> >
> >>OK, the lockfile now seems to work, at least I have a setting in
> >>/etc/default/ctdb and both nodes are OK.
> >>
> >>How have I managed this: well after reading something on a google link,
> >>I did something, changed where 'CTDB_RECOVERY_LOCK' pointed to and it
> >>now works.
> >>
> >>What did I do???
> >>
> >>I INSTALLED AND SETUP AN NFS SERVER ON ONE OF THE NODES!!!
> >>
> >>Great, to use ctdb and samba (which is a way to share files), you have
> >>to set up a separate way of sharing files.
> >Please stop.  You're embarrassing yourself and you're spreading
> >misinformation that people will find when they search for information
> >about running CTDB with OCFS2.
> >
> >What you say is simply not true, apart from that fact that you need
> >lock coherency between the nodes to be able to use the recovery lock.
> >You have simply hacked a workaround that can apparently make the
> >recovery lock work.  To make the recovery lock work properly your
> >cluster filesystem needs lock coherency.  We have already discussed this
> >several weeks ago:
> >
> >   https://lists.samba.org/archive/samba-technical/2014-December/104426.html
> >
> >Nothing has changed.
> 
> Yes, nothing has changed, the ping_pong test works just like the wiki page
> says it should,

So what does really happen? The following?
- ping_pong seems to work, i.e. you  run (e.g.)
  "ping_pong file 3" on two nodes and you see
  lock rates printed by both processes?
- "ping_ping -rw" seem sto work in that it
  prints correct data increment values

> so if you are saying that the test is unreliable, then so be
> it.

No, that is negative. If there is a problem with the test
then we did not see it before, and now we are narrowing it
down...

As Martin has indicated, a possible explanation of what is
happening is this:

Your file system setup does not provide correct fcntl byte range
lock semantics across nodes in a way that the lock calls seem
to succeed on each node, but no real contention happens between
nodes, i.e. the lock calls only have a local effect.
And while there is no other accessor, the -rw test also seems to
work nicely.

Martin has proposed the visible dropping of the lock rate
in the (non-rw test) when adding a second process as an
indication of success, but as I already posted some other day
that is not a reliable measure.

I have an idea of how I could improve the test to better
detect this kind of lack of support, but after thinking
about it for a bit, here is the test run that you can
do right now without the need of new tools:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Test:

- run "ping_pong /clustered/file 2"
  on one node
- run the same command on a second node

Note the number "2": This is the exact number of
processes we intend to run on the file, and not
as in the other test, a larger number. This is important!

Result:

- If both commands happily print lock rates,
  then your file system does NOT support the necessary
  cross-node fcntl byte range lock semantics.

- If the file system supports fcntl byte range locks
  cross node, then one process will print
  "lock at 0 failed! - Resource deadlock avoided"
  and the other will print
  "lock at 1 failed! - Resource deadlock avoided"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

That's it.
Try it on a local file system, two processes on the
same file. Or two processes on the same file in the
cluster FS but running on the same node.
You will see the deadlock avoided messages.

But my guess is that when running two processes on
two different nodes, then you'll see both printing
positive lock rates. If this is the case we have
proof that your FS setup is still not fit for ctdb.

(And I think, that Ralph has also already indicated
why: you have not configured/setup DLM.)

I think I should update the wiki page and possibly
manpage with these hints, but let's see what they
gain you first.

> >It appears that getting lock coherency to work in OCFS2 is staggeringly
> >difficult.  Unfortunately, apart from Richard's work, we have no recipe
> >for setting up OCFS2 with lock coherence.  We can't tell you what
> >is wrong with your cluster except that CTDB's lock coherence test for
> >the recovery lock is failing.  Perhaps this is a topic that should be
> >taken to an OCFS2 mailing list?
> 
> Why? as far as I can see (and from the information I can find) everything
> works until I try to get ctdb to set the lockfile.

Right, and this simply means that your ocfs2 configuration is not
correct yet. And as remarked above already, I think that Ralph has
also pointed out the fact that you have not configured or set up
the dlm?

You have to understand that we on this list are generally not
developers of the clustered file systems. We develop the ctdb
software that just requires and uses a tiny but important feature
of the file system. Most of us have of course worked with one
file system or another but that may not be ocfs2, and we may not
even have any concrete personal experience with ocfs2 (like me).
So we try to help you the best we can, but we can not necessarily
tell you how to fix the file system (setup). We can tell you where to
look, generally speaking, and maybe someone more knowledgeable of
OCFS2 can chime in.

> I then find a post that said set the lock on an NFS shared
> directory, I do this and the two nodes are now both OK and you
> are telling me that what I am doing is wrong and blaming it on
> OCFS2, but will not or cannot tell me what is wrong.

NFS in this respect is just another distributed storage
that happens to offer posix fcntl byte range locks
cross node at least to some extent. But if you do it this
way, then you are completely ignoring the problems that you
have with your ocfs setup. I.e. you install an unneccessary
workaround in order to get ctdb up healthy, but will then
still have the problems with your OCFS2 setup and this can
hurt you when you serve files off OCFS2 with samba.

I.e.: Installing NFS for the reclock does not fix your OCFS2 setup!

In this respect it is the wrong fix, even if it does get you a
healthy ctdb. And nobody is blaming it on OCFS2. But rather on
your setup, which must still be flawed. One could possibly blame
OCFS2 of being hard to setup right, but I'm not in the position
to do so. Others (like Richard) have reported succes so it
must be possible.

> >How about we leave it at that and stop beating up on CTDB because a
> >particular filesystem doesn't (easily) provide a prerequisite feature?
> 
> I will say it again, everything seems to work ok until you set
> 'CTDB_RECOVERY_LOCK' to be on [...] the cluster,

Right and I repeat that this means that your setup of the cluster
FS is still not correct.

> but if you set it on a NFS share it seems to work.

But this does not fix the setup of your cluster FS.

Now back to the top:
Could you run the "ping_pong ... 2" test?

Cheers - Michael
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20150103/b38e8883/attachment.pgp>