about active/active clustered nfs with ctdb

Fri Jan 29 08:29:30 UTC 2021

everyone,I am sorry that when writing the email I mistakenly pressed some key and the email client sended a reply email.

there is  a still problem:

1) the user of nfs mounts an nfs share using TCP

2) the user cp a large file for example 20GB file

4) (the cp does not complete) the corresponding server node fails 

then I find that the cp is blocked as my expectation.

but after iscsi lun/server/share/public ip transfer to other server node,

the cp is still blocked and stops writing.

is it because that the smnotify tool uses UDP to notify nfs ?

ref: ctdb/utils/smnotify/smnotify.c

by the way, is there any doc about the internal of ctdb ?

thanks for any advice.

At 2021-01-28 19:13:02, "ronnie sahlberg via samba-technical" <samba-technical at lists.samba.org> wrote:
>Hi,
>
>I havent worked on ctdb in ages, but the iscsi scripts in ctdb is
>probably not suitable for your use case.
>It is aimed at when you want to export an LUN via a specific
>targetname from the ctdb cluster to external iscsi clients
>and basically have a active/passive failover mode for the target/lun
>pairs across nodes.
>
>What you try to do is have iscsi used internally for storage and then
>have a file system ontop these luns and export then as NFS shares to
>to nfs clients.
>That could be done, I guess, but is not what I think the current
>scripts do so you might have to write a bunch of new eventscripts to
>do what you want.
>
>The nfs support in the eventscripts also might be problematic. When I
>worked on them they were only aimed at nfsv3.
>As nfsv4 is displacing v3 quite rapidly, these scripts may or may not
>work for you.
>But they also were aimed at an active/active configuration where all
>the data is shared from a common cluster backend and is available
>active/active through each node.
>I am not sure how well the current scripts will work with nfsv4 since
>there are so much more state involved.
>
>
>Since you basically want each share to be handled in a active/passive
>failover mode I think pacemaker will be a much better
>fit an easier solution than trying to push a acrive/passive failover
>model into ctdb.
>
>Pacemaker as you said does need a shared resource to handle safe
>failover. In ctdb this is mostly handled by the shared backend cluster
>filesystem that ctdb is designed to sit ontop of.
>In a pacemaker solution, as you don not have a backend filesystem with
>coherent locking, you will need a different solution to avoid
>split-brain.
>I am no longer familiar at all with current best practice for
>pacemaker but I think having a shared, high-available SCSI resource
>that supports PersistentReservation could be a solution.
>Using PR to ensure that only one node at a time is active.
>But, this is all very old and possibly obsolete understanding of pacemaker.
>
>
>TL;DR
>Still, I think as you want active/passive failover for your shares
>pacemaker is likely what you want and not ctdb.
>The pacemaker folks will know much better how you would set these
>systems up than I do.
>
>regards
>ronnie s
>
>
>On Thu, Jan 28, 2021 at 8:01 via a spPM 风无名 <wuming_81 at 163.com> wrote:
>>
>> "In your scenario, is the filesystem on each LUN associated with a particular public IP address?"
>> yes
>>
>> "It would be good if you could do this without modifying 10.interface. It would be better if you could do it by adding a new event script."
>> thanks.
>> I am sorry that I have another question.
>> redhat provides another solution:
>> https://www.linuxtechi.com/configure-nfs-server-clustering-pacemaker-centos-7-rhel-7/
>> they use pacemaker to make an active/passive  nfs cluster. its goal is very similar to mine.
>>
>> if the cluster consists of just two nodes, we know that there does not exist a correct algorithm for the consensus problem. The pacemaker solution of redhat uses a fence device (we can use a shared disk. for example iscsi lun, as a fencing device),  so it may be correct.
>> But I have not found any doc about fence device and ctdb, so in theory my solution may be not correct for two-nodes cluster.
>> I am very curious how does ctdb tackle the problem or the problem is not tackled.
>>
>> if any how-tos or implementation/principle of ctdb is provided I will be glad.
>> sorry to bother.
>> thanks for your reply.
>>
>> At 2021-01-28 17:25:16, "Martin Schwenke" <martin at meltin.net> wrote:
>> >Hmmm.  Sorry, I might have read too quickly and misunderstood.  70.iscsi
>> >is only designed to run tgtd on nodes and export LUNs from public IP
>> >addresses. In your example the nodes are iSCSI clients, mounting a
>> >filesystem on the LUN and exporting it via NFS.  That is very different.
>> >
>> >Sorry for the confusion.
>> >
>> >In your scenario, is the filesystem on each LUN associated with a
>> >particular public IP address?
>> >
>> >It would be good if you could do this without modifying 10.interface.
>> >It would be better if you could do it by adding a new event script.
>> >
>> >peace & happiness,
>> >martin
>> >
>> >On Thu, 28 Jan 2021 09:55:29 +0800 (CST), 风无名 <wuming_81 at 163.com>
>> >wrote:
>> >
>> >> martin, thanks for your reply.
>> >> No, I did not modify 70.iscsi. Maybe I need to make full understanding of it.
>> >>
>> >>
>> >> after many days reading/debuging the source code of ctdb and its shell scripts, I found the key point in the script 10.interface.
>> >> my modification  is:
>> >> 1 create nfs share(mount fs, modify /etc/exports, restart nfs service ..) before any public ip is added to some interface
>> >> 2 delete the corresponding nfs share after any public ip is removed from some interface
>> >>
>> >>
>> >> I tested many shutdown-reboot cycles (of node in a ctdb cluster), and the results are the same as my expectation.
>> >> I think I need more tests and more scenario tests.
>>
>>
>>
>>