read caching across close

Thu Apr 29 17:04:52 GMT 2004

On Thu, 2004-04-29 at 11:38, Richard Sharpe wrote:
> On Thu, 29 Apr 2004, Peter Waechtler wrote:
> 
> > Am Mittwoch, 28. April 2004 23:03 schrieb Steve French:
> > > unless I start doing lazy close of oplocked files (which is an option),
> > >
> > > Although the cifs vfs gets good cache across close performance in the
> > > common case of:
> > >
> > > 1) open file, get oplock
> > > 2) read 1st page
> > > 5) close
> > > 6) reopen file, get oplock
> > > 7) read 1st page out of client's cache
> > >
> > 
> > You give up the oplock when you send a close over the wire.
> > IMHO you have to purge the page and read it in again and don't rely
> > on the timestamp.
> > With a  BATCH oplock you wouldn't sent the close over the wire.
> > Instead you start a timer and close the file when a program does
> > not open the file again.
> > If you close the file - you give up the oplock.
> 
> Yes, I have to agree with that. There is an opportunity for someone to get 
> at the file between your close and re-open, and given that they could set 
> the times to what they were before they messed with the file, you lose.
> 
> Regards
> -----
> Richard Sharpe, rsharpe[at]richardsharpe.com, rsharpe[at]samba.org, 
> sharpe[at]ethereal.com, http://www.richardsharpe.com

I agree that for strictest semantics (and to reduce the risk of an app
changing the contents but resetting the time) that the long oplock
approach (either one) is safest but this needs to be tested in practice
for practicality.   There are resource considerations on the server with
holding an oplock (either traditional or batch) until the client sends
the delayed close.  The close (releasing the oplock on the server, and
the server resources associated with it) will be sent eventually when 

1) the client kernel tells the cifs client to free the inode 
2) when the filesystem is unmounted (a general case of freeing all
inodes on the mount)
3) when an oplock break is sent from the server
4) before the routine inode (or dentry) revalidate check, the client
could based on timer close files that have not been reopened in some
time window (perhaps 10 minutes seems reasonable - but it could be
configured).  This does not require a distinct thread since it
dentry_revalidate or inode revalidate are called often enough to be
useful for this purpose

This approach is also much stricter than NFS practice, and probably
stricter than most other network filesystems (which have much larger
windows of time when the client's data can be stale) although a few of
the SAN and cluster filesystems like SANFS and Lustre have pretty strict
semantics.

By the way, on Linux there are some minor problems with invalidating
cached data for network filesystems (at least it needs more code to
address it) - the "invalidate_remote_inode" function, at least in one
case, does not free all pages for the inode (which would cause them to
have to be reread from the server next time) - with the effect that the
next write to that page if smaller than a page and in the middle of a 
page - to zero the beginning and end of a page (a range that might not
be zero on the server - but which the client has not reread before it
starting writing to the page again). See the post at:

http://marc.theaimsgroup.com/?l=linux-fsdevel&m=108321160405756&w=2