Better interop for NFS/SMB file share mode/reservation
amir73il at gmail.com
Sat Apr 27 20:16:42 UTC 2019
[adding back samba/nfs and fsdevel]
On Fri, Apr 26, 2019 at 6:22 PM Jeff Layton <jlayton at kernel.org> wrote:
> On Fri, 2019-04-26 at 10:50 -0400, J. Bruce Fields wrote:
> > On Fri, Apr 26, 2019 at 04:11:00PM +0200, Amir Goldstein wrote:
> > > On Fri, Apr 26, 2019, 4:00 PM J. Bruce Fields <bfields at fieldses.org> wrote:
> > >
> > > > On Fri, Apr 26, 2019 at 03:50:46PM +0200, Amir Goldstein wrote:
> > > > > On Fri, Feb 8, 2019, 5:03 PM Jeff Layton <jlayton at kernel.org> wrote:
> > > > > > Share/deny open semantics are pretty similar across NFS and SMB (by
> > > > > > design, really). If you intend to solve that use-case, what you really
> > > > > > want is whole-file, shared/exclusive locks that are set atomically with
> > > > > > the open call. O_EXLOCK and O_SHLOCK seem like a reasonable fit there.
> > > > > >
> > > > > > Then you could have SMB and NFS servers set these flags when opening
> > > > > > files, and deal with the occasional denial at open time. Other
> > > > > > applications won't be aware of them of course, but that's probably fine
> > > > > > for most use-cases where you want this sort of protocol interop.
> > > > >
> > > > > Sorry for posting off list. Airport emails...
> > > > > I looked at implemeting O_EXLOCK and O_SHLOCK and it looks doable.
> > > > >
> > > > > I was wondering if there is an inherent reason not to allow an exclusive
> > > > > lock on a file that is open read-only.
> > > > >
> > > > > Samba seems to need it and currently flock and ofd locks won't allow it.
> > > > > Do you thing it will be ok to allow it with O_EXLOCK?
> > > >
> > > > Somebody could deny everyone access to a shared resource that everyone
> > > > needs to make progress, like /etc/passwd or a shared library.
> > > >
> > > > Have you looked at Pavel Shilovsky's O_DENY patches? He had the feature
> > > > off by default, with a mount option provided to turn it on.
> > > >
> > >
> > > O_EXLOCK is advisory. It only aquired flock or ofd lock atomically with
> > > open.
> > Whoops, got it.
> > Is that really adequate for open share locks, though?
> > I assumed that Windows apps depend on the assumption that they're
> > mandatory. So e.g. if you can get a DENY_READ open on a shared library
> > then you know you can update it without the risk of making someone else
> > crash.
> I think this is (slightly) better than doing it internally like we do
> today and would give you coherent locking between NFS and SMB. Other
> applications wouldn't see them, but for a NAS-style deployment, that's
> probably ok.
We can do a little bit better.
We can make sure that O_DENY_WRITE (named for convenience) fails
if file is currently open for write by anyone and similarly for O_DENY_READ.
But if we cannot deny future non-cooperative opens what's the point?....
> Any open by samba or nfsd would need to start setting O_SHLOCK, and deny
> mode opens would have to set O_EXLOCK. We would actually need 2 per
> inode though (one for read and one for write).
...the point is that O_DENY_NONE does not need to be implemented with
a new type of lock object (O_WR_SHLOCK) its enough that it checks there
are no relevant exclusive locks and the then inode->i_writecount and
inode->i_readcount already provide enough context to cooperate with
O_DENY_WRITE and O_DENY_READ.
I need to see if incrementing inode->i_readcount on O_RDWR opens is
possible (right now it only counts O_RDONLY opens).
> I think these should probably be in their own "namespace" too. They
> could use the same semantics as flock, but should sit on their own list
> in file_lock_context.
I would much rather that they didn't. The reason is that new open flags
are a backward compat problem. The way I want to solve it is this API:
// On new kernel this will acquire OFD F_WRLCK atomically...
fd = open(..., O_RDWR | O_EXLOCK);
// ...check if it did acquire OFD lock
fcntl(fd, F_OFD_GETLK, ...);
We'd need at least one new l_type F_EX_RDLCK and maybe also a new
semantic F_EX_RDWRLCK, although similar in conflicts to F_WRLCK it can be
acquired without FMODE_WRITE. Though I personally thing we can do without
it if the only way to acquire F_WRLCK on readonly file is via new open flag.
> That said, we could also look at a vfs-level mount option that would
> make the kernel enforce these for any opener. That could also be useful,
> and shouldn't be too hard to implement. Maybe even make it a vfsmount-
> level option (like -o ro is).
Yeh, I am humbly going to leave this struggle to someone else.
Not important enough IMO and completely independent effort to the
advisory atomic open&lock API.
> If you're denied, what error should you get back when you try to open
> it? It should be something distinct. We may even want to add new error
> codes for this.
IMO EBUSY does the job. Its distinct because open is not expected
to return EBUSY for regular files/dirs and when open is expected to
return EBUSY for blockdev its for the exact same use case (i.e.
exclusive write open is acquired by userspace tools).
More information about the samba-technical