[Nfs-ganesha-devel] [RFC PATCH 0/5] locks: implement "filp-private" (aka UNPOSIX) locks

Sat Oct 12 12:10:33 MDT 2013

> > > At LSF this year, there was a discussion about the "wishlist" for
> > > userland file servers. One of the things brought up was the goofy
> > > and problematic behavior of POSIX locks when a file is closed. Boaz
> > > started a thread on it here:
> > >
> > >     http://permalink.gmane.org/gmane.linux.file-systems/73364
> > >
> > > Userland fileservers often need to maintain more than one open file
> > > descriptor on a file. The POSIX spec says:
> > >
> > > "All locks associated with a file for a given process shall be
> > > removed  when a file descriptor for that file is closed by that
> > > process or the  process holding that file descriptor terminates."
> > >
> > > This is problematic since you can't close any file descriptor
> > > without dropping all your POSIX locks. Most userland file servers
> > > therefore end up opening the file with more access than is really
> > > necessary, and keeping fd's open for longer than is necessary to work
> around this.
> > >
> > > This patchset is a first stab at an approach to address this problem
> > > by adding two new l_type values -- F_RDLCKP and F_WRLCKP (the 'P' is
> > > short for "private" -- I'm open to changing that if you have a
> > > better mnemonic).
> > >
> > > For all intents and purposes these lock types act just like their
> > > "non-P" counterpart. The difference is that they are only implicitly
> > > released when the fd against which they were acquired is closed. As
> > > a side effect, these locks cannot be merged with "non-P" locks since
> > > they have different semantics on close.
> > >
> > > I've given this patchset some very basic smoke testing and it seems
> > > to do the right thing, but it is still pretty rough. If this looks
> > > reasonable I'll plan to do some documentation updates and will take
> > > a stab at trying to get these new lock types added to the POSIX spec
> > > (as HCH recommended).
> > >
> > > At this point, my main questions are:
> > >
> > > 1) does this look useful, particularly for fileserver implementors?
> > >
> > > 2) does this look OK API-wise? We could consider different "cmd"
values
> > >    or even different syscalls, but I figured this makes it clearer
that
> > >    "P" and "non-P" locks will still conflict with one another.
> > >
> > > Jeff Layton (5):
> > >   locks: consolidate checks for compatible filp->f_mode values in
setlk
> > >     handlers
> > >   locks: add definitions for F_RDLCKP and F_WRLCKP
> > >   locks: skip FL_FILP_PRIVATE locks on close unless we're closing the
> > >     correct filp
> > >   locks: handle merging of locks when FL_FILP_PRIVATE is set
> > >   locks: show private lock types in /proc/locks
> >
> > I haven't looked at the patches, but it would be very good to have
> > locks per "open" and not per "fd".
> >
> 
> My intent is to make it "per-filp" (aka "struct file") in the same way
that
> flock() locks are today. Note that the patchset posted so far doesn't
quite
> have the right semantics yet.
> 
> Currently, I think that we want to give these locks flock-like inheritance
and
> close semantics, but to allow them to conflict with "legacy" POSIX range
> locks.
> 
> > What happens in this example?
> >
> 
> As I said, I haven't sat down to change the implementation yet, but I'll
try to
> answer this in the way that I think we'll want to do it...
> 
> > fd1 = open("/somefile", ...);
> > fd2 = open("/somefile", ...);
> > fd3 = dup(fd1);
> >
> 
> At this point:
> 
> fd1 = filp1
> fd2 = filp2
> fd3 = filp1
> 
> ...fd1 and fd3 both hold a reference to filp1.
> 
> > lock(fd1, range1)
> > lock(fd2, range2)
> > lock(fd3, range3)
> >
> 
> I'll assume that lock() means setting a F_SETLK with F_WRLCKP
> 
> > lock(fd2, range1) // => error already locked?
> >
> 
> Right. fd1 will hold the lock on range1 so -EAGAIN.
> 
> > lock(fd3, range1) // stacked lock?
> >
> 
> Not stacked per-se, but replaced. Since fd1 == fd3, this lock() call won't
> conflict and the new lock will replace the old one. Since the range is the
same
> though, there will be no real difference in the outcome.
> 
> > close(fd1)
> >
> 
> fput(filp1), but fd3 still has a reference so the lock won't be released.
> 
> > lock(fd2, range1) // is range1 still locked by fd3 ?
> >
> 
> Yep, still locked.
> 
> > What about fd-passing, will the locks be transferred/shared with the
> > other process?
> >
> 
> Yes, I think so. Locks will be passed to the other process in the same way
that
> flock() locks are today. AIUI, when you fork() you basically
> dup() all the file descriptors of the parent so that's basically the same
as what
> happens above.
> 
> Again though, I'm still trying to settle on what the semantics should be.
None
> of this is etched in stone yet.

At a quick read, that sounds right to me, connect the locks to the kernel
struct file (filp) and we will get the desirable semantics you describe and
I think it will be easy to document the behavior.

Frank