about active/active clustered nfs with ctdb

Fri Jan 29 08:48:35 UTC 2021

On Fri, Jan 29, 2021 at 6:30 PM 风无名 <wuming_81 at 163.com> wrote:
>
> everyone,I am sorry that when writing the email I mistakenly pressed some key and the email client sended a reply email.
>
>
>
> there is  a still problem:
>
> 1) the user of nfs mounts an nfs share using TCP
>
> 2) the user cp a large file for example 20GB file
>
> 4) (the cp does not complete) the corresponding server node fails
>
>
> then I find that the cp is blocked as my expectation.
>
> but after iscsi lun/server/share/public ip transfer to other server node,
>
> the cp is still blocked and stops writing.
>
>
> is it because that the smnotify tool uses UDP to notify nfs ?
>
> ref: ctdb/utils/smnotify/smnotify.c

No it has nothing to do with smnotify. That tool is ONLY used to
trigger re-negotiate of byte range locks for nfsv3.
Byte range locks are rare in unix/linux applications and they are not
used at all with the 'cp' command.
What is likely happening is imho (without any data to go on) that a
failover does happen but state needed by knfsd for fs is either
missing
or something is wrong with your scripts and the failover does not
happen properly.

Have you tried to force the client to use nfsv3?  I have strong
suspicions that nfsv4 will absolutely not work with the ctdb
eventscripts
unless Martin have rewritten them to be be nfsv4 capable.
If nfsv3 also does not works, well, then I am out of ideas. Maybe your
eventscripts are not working?

Still, since you are working to build a active/passive failover
solution. ctdb is the wrong tool to do this.
You really should look at pacemaker or similar for active/passive.

>
>
> by the way, is there any doc about the internal of ctdb ?
>
>
> thanks for any advice.
>
>
>
>
>
>
>
>
> At 2021-01-28 19:13:02, "ronnie sahlberg via samba-technical" <samba-technical at lists.samba.org> wrote:
> >Hi,
> >
> >I havent worked on ctdb in ages, but the iscsi scripts in ctdb is
> >probably not suitable for your use case.
> >It is aimed at when you want to export an LUN via a specific
> >targetname from the ctdb cluster to external iscsi clients
> >and basically have a active/passive failover mode for the target/lun
> >pairs across nodes.
> >
> >What you try to do is have iscsi used internally for storage and then
> >have a file system ontop these luns and export then as NFS shares to
> >to nfs clients.
> >That could be done, I guess, but is not what I think the current
> >scripts do so you might have to write a bunch of new eventscripts to
> >do what you want.
> >
> >The nfs support in the eventscripts also might be problematic. When I
> >worked on them they were only aimed at nfsv3.
> >As nfsv4 is displacing v3 quite rapidly, these scripts may or may not
> >work for you.
> >But they also were aimed at an active/active configuration where all
> >the data is shared from a common cluster backend and is available
> >active/active through each node.
> >I am not sure how well the current scripts will work with nfsv4 since
> >there are so much more state involved.
> >
> >
> >Since you basically want each share to be handled in a active/passive
> >failover mode I think pacemaker will be a much better
> >fit an easier solution than trying to push a acrive/passive failover
> >model into ctdb.
> >
> >Pacemaker as you said does need a shared resource to handle safe
> >failover. In ctdb this is mostly handled by the shared backend cluster
> >filesystem that ctdb is designed to sit ontop of.
> >In a pacemaker solution, as you don not have a backend filesystem with
> >coherent locking, you will need a different solution to avoid
> >split-brain.
> >I am no longer familiar at all with current best practice for
> >pacemaker but I think having a shared, high-available SCSI resource
> >that supports PersistentReservation could be a solution.
> >Using PR to ensure that only one node at a time is active.
> >But, this is all very old and possibly obsolete understanding of pacemaker.
> >
> >
> >TL;DR
> >Still, I think as you want active/passive failover for your shares
> >pacemaker is likely what you want and not ctdb.
> >The pacemaker folks will know much better how you would set these
> >systems up than I do.
> >
> >regards
> >ronnie s
> >
> >
> >On Thu, Jan 28, 2021 at 8:01 via a spPM 风无名 <wuming_81 at 163.com> wrote:
> >>
> >> "In your scenario, is the filesystem on each LUN associated with a particular public IP address?"
> >> yes
> >>
> >> "It would be good if you could do this without modifying 10.interface. It would be better if you could do it by adding a new event script."
> >> thanks.
> >> I am sorry that I have another question.
> >> redhat provides another solution:
> >> https://www.linuxtechi.com/configure-nfs-server-clustering-pacemaker-centos-7-rhel-7/
> >> they use pacemaker to make an active/passive  nfs cluster. its goal is very similar to mine.
> >>
> >> if the cluster consists of just two nodes, we know that there does not exist a correct algorithm for the consensus problem. The pacemaker solution of redhat uses a fence device (we can use a shared disk. for example iscsi lun, as a fencing device),  so it may be correct.
> >> But I have not found any doc about fence device and ctdb, so in theory my solution may be not correct for two-nodes cluster.
> >> I am very curious how does ctdb tackle the problem or the problem is not tackled.
> >>
> >> if any how-tos or implementation/principle of ctdb is provided I will be glad.
> >> sorry to bother.
> >> thanks for your reply.
> >>
> >> At 2021-01-28 17:25:16, "Martin Schwenke" <martin at meltin.net> wrote:
> >> >Hmmm.  Sorry, I might have read too quickly and misunderstood.  70.iscsi
> >> >is only designed to run tgtd on nodes and export LUNs from public IP
> >> >addresses. In your example the nodes are iSCSI clients, mounting a
> >> >filesystem on the LUN and exporting it via NFS.  That is very different.
> >> >
> >> >Sorry for the confusion.
> >> >
> >> >In your scenario, is the filesystem on each LUN associated with a
> >> >particular public IP address?
> >> >
> >> >It would be good if you could do this without modifying 10.interface.
> >> >It would be better if you could do it by adding a new event script.
> >> >
> >> >peace & happiness,
> >> >martin
> >> >
> >> >On Thu, 28 Jan 2021 09:55:29 +0800 (CST), 风无名 <wuming_81 at 163.com>
> >> >wrote:
> >> >
> >> >> martin, thanks for your reply.
> >> >> No, I did not modify 70.iscsi. Maybe I need to make full understanding of it.
> >> >>
> >> >>
> >> >> after many days reading/debuging the source code of ctdb and its shell scripts, I found the key point in the script 10.interface.
> >> >> my modification  is:
> >> >> 1 create nfs share(mount fs, modify /etc/exports, restart nfs service ..) before any public ip is added to some interface
> >> >> 2 delete the corresponding nfs share after any public ip is removed from some interface
> >> >>
> >> >>
> >> >> I tested many shutdown-reboot cycles (of node in a ctdb cluster), and the results are the same as my expectation.
> >> >> I think I need more tests and more scenario tests.
> >>
> >>
> >>
> >>
>
>
>
>