about active/active clustered nfs with ctdb

Thu Jan 28 11:13:02 UTC 2021

Hi,

I havent worked on ctdb in ages, but the iscsi scripts in ctdb is
probably not suitable for your use case.
It is aimed at when you want to export an LUN via a specific
targetname from the ctdb cluster to external iscsi clients
and basically have a active/passive failover mode for the target/lun
pairs across nodes.

What you try to do is have iscsi used internally for storage and then
have a file system ontop these luns and export then as NFS shares to
to nfs clients.
That could be done, I guess, but is not what I think the current
scripts do so you might have to write a bunch of new eventscripts to
do what you want.

The nfs support in the eventscripts also might be problematic. When I
worked on them they were only aimed at nfsv3.
As nfsv4 is displacing v3 quite rapidly, these scripts may or may not
work for you.
But they also were aimed at an active/active configuration where all
the data is shared from a common cluster backend and is available
active/active through each node.
I am not sure how well the current scripts will work with nfsv4 since
there are so much more state involved.

Since you basically want each share to be handled in a active/passive
failover mode I think pacemaker will be a much better
fit an easier solution than trying to push a acrive/passive failover
model into ctdb.

Pacemaker as you said does need a shared resource to handle safe
failover. In ctdb this is mostly handled by the shared backend cluster
filesystem that ctdb is designed to sit ontop of.
In a pacemaker solution, as you don not have a backend filesystem with
coherent locking, you will need a different solution to avoid
split-brain.
I am no longer familiar at all with current best practice for
pacemaker but I think having a shared, high-available SCSI resource
that supports PersistentReservation could be a solution.
Using PR to ensure that only one node at a time is active.
But, this is all very old and possibly obsolete understanding of pacemaker.

TL;DR
Still, I think as you want active/passive failover for your shares
pacemaker is likely what you want and not ctdb.
The pacemaker folks will know much better how you would set these
systems up than I do.

regards
ronnie s

On Thu, Jan 28, 2021 at 8:01 via a spPM 风无名 <wuming_81 at 163.com> wrote:
>
> "In your scenario, is the filesystem on each LUN associated with a particular public IP address?"
> yes
>
> "It would be good if you could do this without modifying 10.interface. It would be better if you could do it by adding a new event script."
> thanks.
> I am sorry that I have another question.
> redhat provides another solution:
> https://www.linuxtechi.com/configure-nfs-server-clustering-pacemaker-centos-7-rhel-7/
> they use pacemaker to make an active/passive  nfs cluster. its goal is very similar to mine.
>
> if the cluster consists of just two nodes, we know that there does not exist a correct algorithm for the consensus problem. The pacemaker solution of redhat uses a fence device (we can use a shared disk. for example iscsi lun, as a fencing device),  so it may be correct.
> But I have not found any doc about fence device and ctdb, so in theory my solution may be not correct for two-nodes cluster.
> I am very curious how does ctdb tackle the problem or the problem is not tackled.
>
> if any how-tos or implementation/principle of ctdb is provided I will be glad.
> sorry to bother.
> thanks for your reply.
>
> At 2021-01-28 17:25:16, "Martin Schwenke" <martin at meltin.net> wrote:
> >Hmmm.  Sorry, I might have read too quickly and misunderstood.  70.iscsi
> >is only designed to run tgtd on nodes and export LUNs from public IP
> >addresses. In your example the nodes are iSCSI clients, mounting a
> >filesystem on the LUN and exporting it via NFS.  That is very different.
> >
> >Sorry for the confusion.
> >
> >In your scenario, is the filesystem on each LUN associated with a
> >particular public IP address?
> >
> >It would be good if you could do this without modifying 10.interface.
> >It would be better if you could do it by adding a new event script.
> >
> >peace & happiness,
> >martin
> >
> >On Thu, 28 Jan 2021 09:55:29 +0800 (CST), 风无名 <wuming_81 at 163.com>
> >wrote:
> >
> >> martin, thanks for your reply.
> >> No, I did not modify 70.iscsi. Maybe I need to make full understanding of it.
> >>
> >>
> >> after many days reading/debuging the source code of ctdb and its shell scripts, I found the key point in the script 10.interface.
> >> my modification  is:
> >> 1 create nfs share(mount fs, modify /etc/exports, restart nfs service ..) before any public ip is added to some interface
> >> 2 delete the corresponding nfs share after any public ip is removed from some interface
> >>
> >>
> >> I tested many shutdown-reboot cycles (of node in a ctdb cluster), and the results are the same as my expectation.
> >> I think I need more tests and more scenario tests.
>
>
>
>