setlease now supportable on network/cluster file systems for Linus

Steve French smfrench at gmail.com
Thu Jul 19 21:39:15 GMT 2007


> That is, can you explain what this is doing and what you plan on changing in
> your code to support it?

Explaining why this setlease call in the Linux VFS will help Samba
probably would be useful for others.

Two Samba servers that export the same data would run on a cluster
filesystem (e.g. GPFS, Lustre or perhaps GFS someday etc.) or network
file system client (e.g. CIFS or NFS or AFS).   Otherwise you would
have to run a replication service like rsync or make the data
read-only.    When Samba is exporting the same directory tree(s) from
two servers, you can not support byte range locks or oplock or access
flags properly without doing more work to keep server state consistent
(which is why ctdb is important) but even with ctdb you also have to
be able to call into the cluster/network filesystem client for two of
these (posix byte range locks and oplock/leases) to be able to
properly deal with other applications running on the servers
themselves (not just NFS or Apache).

For Samba clients to be able to cache data they need to be able to do
oplock.  Oplock requires fcntl "lease."  This allows Samba to request
caching for a file, and if a local app (other than Samba e.g. Apache
or NFS) opens the file - Samba can then be notified to break oplock.
Until yesterday for Linux (and Linux is probably better off than most
operating systems) setlease was a local-only call which was not sent
to the filesystem client.  The problem with this was that to be able
to turn on oplock in a Samba cluster you:

1) risked data corruption: e.g. oplock on on both servers and hope
that a local application such as nfs on server 2, did not write
locally to a file that Samba server 1 granted an oplock on)

2) had to have non-standard extensions (an API call, presumably in the
Samba VFS) to your cluster file system to deal with the missing/broken
setlease fcntl call (this is presumably what is done for Samba on
GPFS)

3) had to make these volumes read-only (so they could not be corrupted)

4) or of course you could turn off oplock on both servers and get
terrible performance

Now setlease can be hooked by the NFS client and GFS2 client and soon
the CIFS client (and eventually GPFS and Lustre clients) so when an
application like nfsd (NFSv4 server) requesting "delegations" (their
oplock) or Samba (requesting oplock) runs on these network/cluster
file systems they can request the ability to cache files (until
yesterday the setlease call was only handled locally in the VFS but
not passed to the filesystem).

The Linux cifs client supports oplock already, so the cifs client
could handle this by:
1) returning success/failure on setlease based on whether the file is oplocked
2) calling break_lease when oplock breaks come in (for files that have
had setlease called on them)

Eventually more could be done to rerequest oplock on the fly when an
application calls setlease


More information about the samba-technical mailing list