转发: file operation is interrupted whenusing ctdb+nfs

Thu Jan 4 22:00:56 UTC 2018

On Thu, 4 Jan 2018 18:32:26 +0800 (CST), <zhu.shangzhong at zte.com.cn>
wrote:

>> On Thu, 4 Jan 2018 14:54:28 +0800 (CST), "zhu.shangzhong--- via
>> samba-technical" <samba-technical at lists.samba.org> wrote:

>> > I build a clustered file system using the ctdb+nfs-ganesha, and
>> > the cephfs is used as nfs-ganesha's backend.

>> > It works well except the file write operation is interrupted. The
>> > following is the steps:

>> > 1. mount the nfs export directory to 192.168.1.20 node /home/nfs,
>> >    the nfs-server ip is 192.168.1.10
>> > 2. cp a big file to /home/nfs
>> > 3. when the cp operation is in process, kill the nfs-ganesha
>> >    process on the nfs-server (node ip: 192.168.1.10)
>> > 4. The cp operation is interrupted and the error message is
>> >    "stale file handle".

>> > Any idea?

>> Questions about the CTDB setup...  :-)

>> * How many nodes are there?

>> * What CTDB public IP addresses are defined?

>> * Do the logs show CTDB failing over public IP addresses?

> There are 3 CTDB nodes and 3 nfs-ganesha servers.

> Their IP address is:           192.168.1.10,  192.168.1.11,  192.1.12.

> The CTDB public IP address is: 192.168.1.30,  192.168.1.31,  192.168.1.32.

> The client IP is 192.168.1.20. The NFS export directory is mounted
> to the client with public IP 192.168.1.30.

> I checked the CTDB logs, the public IP 192.168.1.30 was moved to
> another node(IP: 192.168.1.32)

> when the nfs-server(IP: 192.168.1.10) process was killed.

OK, that seems good.  :-)

* When do you see the "stale file handle" message?  Immediately when
  the NFS Ganesha server is killed or after the failover?

  If it happens immediately when the server is killed then CTDB is not
  involved and you need to understand what is happening at the NFS
  level.

* Are you able to repeat the test against a single NFS Ganesha server
  on a single node?

  This would involve killing the server, seeing what happens to the cp
  command on the client, checking if the file still exists in the
  server filesystem, and then restarting the server.

  If killing the NFS Ganesha server causes the incomplete copy of the
  file to be deleted without communicating a failure to the client
  then this could explain the "stale file handle" message.

  If this can't be made to work then it probably also isn't possible
  by adding more complexity with CTDB.

By the way, if you are able to reply inline instead of "top-posting"
then it is easier to respond to each part of your reply.  :-)

peace & happiness,
martin