about active/active clustered nfs with ctdb

Fri Jan 22 07:55:54 UTC 2021

hello everyone.

I want to build a nfs cluster:

1) the nfs cluster  consists of three nodes(linux servers)

2) each node has  logined an iscsi lun, i.e.

node_1 -> lun_1

node_2 -> lun_2

node_3 -> lun_3

3) make xfs file system on each lun

4) export each xfs file system via NFS

node_1 -> lun_1-> /share-1

node_2 -> lun_2-> /share-2

node_3 -> lun_3-> /share-3

5) ctdb distributes public ips to the nodes

if one node is failed, ctdb redistributes its public ip to another living node.

and the shell scripts executed by ctdb on the node, will mount the file system, restart nfs service etc.

if the failed node restarts, similar steps execute.

I have written some shell scripts to implement the above process.

I hope that during the lun/service moving process, 

the file io on the mount points of the corresponding lun is blocked for one or two minutes and succeeds after the moving process completed. 

My test result is:

scenario 1: one node failed, another takes over its service

the file io on the mount point will be blocked almost everytime

scenario 2: failed node restarts

the file io on the mount point sometimes is blocked, sometimes the io(open, write)  will encounter some error.

could I achieve my goal just through modifying/rewriting the shell script? 

or must I modify the kernel nfs or ctdb? 

thanks for any response/advice.