about active/active clustered nfs with ctdb

Mon Feb 1 06:03:20 UTC 2021

Ronnie,
      thanks for your reply.
      I have solved the problem: 
      in the event script, call the function tickle_tcp_connections implemented in the file /etc/ctdb/functions
      

At 2021-01-29 16:48:35, "ronnie sahlberg via samba-technical" <samba-technical at lists.samba.org> wrote:
>On Fri, Jan 29, 2021 at 6:30 PM 风无名 <wuming_81 at 163.com> wrote:
>>
>> everyone,I am sorry that when writing the email I mistakenly pressed some key and the email client sended a reply email.
>>
>>
>>
>> there is  a still problem:
>>
>> 1) the user of nfs mounts an nfs share using TCP
>>
>> 2) the user cp a large file for example 20GB file
>>
>> 4) (the cp does not complete) the corresponding server node fails
>>
>>
>> then I find that the cp is blocked as my expectation.
>>
>> but after iscsi lun/server/share/public ip transfer to other server node,
>>
>> the cp is still blocked and stops writing.
>>
>>
>> is it because that the smnotify tool uses UDP to notify nfs ?
>>
>> ref: ctdb/utils/smnotify/smnotify.c
>
>No it has nothing to do with smnotify. That tool is ONLY used to
>trigger re-negotiate of byte range locks for nfsv3.
>Byte range locks are rare in unix/linux applications and they are not
>used at all with the 'cp' command.
>What is likely happening is imho (without any data to go on) that a
>failover does happen but state needed by knfsd for fs is either
>missing
>or something is wrong with your scripts and the failover does not
>happen properly.
>
>
>Have you tried to force the client to use nfsv3?  I have strong
>suspicions that nfsv4 will absolutely not work with the ctdb
>eventscripts
>unless Martin have rewritten them to be be nfsv4 capable.
>If nfsv3 also does not works, well, then I am out of ideas. Maybe your
>eventscripts are not working?
>
>Still, since you are working to build a active/passive failover
>solution. ctdb is the wrong tool to do this.
>You really should look at pacemaker or similar for active/passive.
>
>
>
>>
>>
>> by the way, is there any doc about the internal of ctdb ?
>>
>>
>> thanks for any advice.
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2021-01-28 19:13:02, "ronnie sahlberg via samba-technical" <samba-technical at lists.samba.org> wrote:
>> >Hi,
>> >
>> >I havent worked on ctdb in ages, but the iscsi scripts in ctdb is
>> >probably not suitable for your use case.
>> >It is aimed at when you want to export an LUN via a specific
>> >targetname from the ctdb cluster to external iscsi clients
>> >and basically have a active/passive failover mode for the target/lun
>> >pairs across nodes.
>> >
>> >What you try to do is have iscsi used internally for storage and then
>> >have a file system ontop these luns and export then as NFS shares to
>> >to nfs clients.
>> >That could be done, I guess, but is not what I think the current
>> >scripts do so you might have to write a bunch of new eventscripts to
>> >do what you want.
>> >
>> >The nfs support in the eventscripts also might be problematic. When I
>> >worked on them they were only aimed at nfsv3.
>> >As nfsv4 is displacing v3 quite rapidly, these scripts may or may not
>> >work for you.
>> >But they also were aimed at an active/active configuration where all
>> >the data is shared from a common cluster backend and is available
>> >active/active through each node.
>> >I am not sure how well the current scripts will work with nfsv4 since
>> >there are so much more state involved.
>> >
>> >
>> >Since you basically want each share to be handled in a active/passive
>> >failover mode I think pacemaker will be a much better
>> >fit an easier solution than trying to push a acrive/passive failover
>> >model into ctdb.
>> >
>> >Pacemaker as you said does need a shared resource to handle safe
>> >failover. In ctdb this is mostly handled by the shared backend cluster
>> >filesystem that ctdb is designed to sit ontop of.
>> >In a pacemaker solution, as you don not have a backend filesystem with
>> >coherent locking, you will need a different solution to avoid
>> >split-brain.
>> >I am no longer familiar at all with current best practice for
>> >pacemaker but I think having a shared, high-available SCSI resource
>> >that supports PersistentReservation could be a solution.
>> >Using PR to ensure that only one node at a time is active.
>> >But, this is all very old and possibly obsolete understanding of pacemaker.
>> >
>> >
>> >TL;DR
>> >Still, I think as you want active/passive failover for your shares
>> >pacemaker is likely what you want and not ctdb.
>> >The pacemaker folks will know much better how you would set these
>> >systems up than I do.
>> >
>> >regards
>> >ronnie s
>> >
>> >
>> >On Thu, Jan 28, 2021 at 8:01 via a spPM 风无名 <wuming_81 at 163.com> wrote:
>> >>
>> >> "In your scenario, is the filesystem on each LUN associated with a particular public IP address?"
>> >> yes
>> >>
>> >> "It would be good if you could do this without modifying 10.interface. It would be better if you could do it by adding a new event script."
>> >> thanks.
>> >> I am sorry that I have another question.
>> >> redhat provides another solution:
>> >> https://www.linuxtechi.com/configure-nfs-server-clustering-pacemaker-centos-7-rhel-7/
>> >> they use pacemaker to make an active/passive  nfs cluster. its goal is very similar to mine.
>> >>
>> >> if the cluster consists of just two nodes, we know that there does not exist a correct algorithm for the consensus problem. The pacemaker solution of redhat uses a fence device (we can use a shared disk. for example iscsi lun, as a fencing device),  so it may be correct.
>> >> But I have not found any doc about fence device and ctdb, so in theory my solution may be not correct for two-nodes cluster.
>> >> I am very curious how does ctdb tackle the problem or the problem is not tackled.
>> >>
>> >> if any how-tos or implementation/principle of ctdb is provided I will be glad.
>> >> sorry to bother.
>> >> thanks for your reply.
>> >>
>> >> At 2021-01-28 17:25:16, "Martin Schwenke" <martin at meltin.net> wrote:
>> >> >Hmmm.  Sorry, I might have read too quickly and misunderstood.  70.iscsi
>> >> >is only designed to run tgtd on nodes and export LUNs from public IP
>> >> >addresses. In your example the nodes are iSCSI clients, mounting a
>> >> >filesystem on the LUN and exporting it via NFS.  That is very different.
>> >> >
>> >> >Sorry for the confusion.
>> >> >
>> >> >In your scenario, is the filesystem on each LUN associated with a
>> >> >particular public IP address?
>> >> >
>> >> >It would be good if you could do this without modifying 10.interface.
>> >> >It would be better if you could do it by adding a new event script.
>> >> >
>> >> >peace & happiness,
>> >> >martin
>> >> >
>> >> >On Thu, 28 Jan 2021 09:55:29 +0800 (CST), 风无名 <wuming_81 at 163.com>
>> >> >wrote:
>> >> >
>> >> >> martin, thanks for your reply.
>> >> >> No, I did not modify 70.iscsi. Maybe I need to make full understanding of it.
>> >> >>
>> >> >>
>> >> >> after many days reading/debuging the source code of ctdb and its shell scripts, I found the key point in the script 10.interface.
>> >> >> my modification  is:
>> >> >> 1 create nfs share(mount fs, modify /etc/exports, restart nfs service ..) before any public ip is added to some interface
>> >> >> 2 delete the corresponding nfs share after any public ip is removed from some interface
>> >> >>
>> >> >>
>> >> >> I tested many shutdown-reboot cycles (of node in a ctdb cluster), and the results are the same as my expectation.
>> >> >> I think I need more tests and more scenario tests.
>> >>
>> >>
>> >>
>> >>
>>
>>
>>
>>