[linux-cifs-client] Re: set last write time = fsync ?
Steve French
smfrench at gmail.com
Fri Mar 14 16:16:41 GMT 2008
I don't worry about flushing atime (anyone crazy enough to do that
would pay a huge performance penalty).
Access is usually checked on open right ... so once a file is open
even if the file becomes read-only, the writes, even cached writes
continue.
On Fri, Mar 14, 2008 at 5:40 AM, Jeff Layton <jlayton at redhat.com> wrote:
> On Thu, 13 Mar 2008 21:32:32 -0500
>
>
> "Steve French" <smfrench at gmail.com> wrote:
>
> > On Thu, Mar 13, 2008 at 6:10 AM, Jeff Layton <jlayton at redhat.com> wrote:
> > >
> > > On Wed, 12 Mar 2008 22:54:50 -0500
> > > "Steve French" <smfrench at gmail.com> wrote:
> > >
> > > > kukks noticed that "cp -p source-file /cifs-mnt" would not set the
> > > > last write time to the previous time
> > > >
> > > > It looks like a straightforward caching sideffect - "cp -p" does
> > > >
> > > > open
> > > > write
> > > > set time stamps (utimensat)
> > > > set uid/gid (fchown32)
> > > > set acl (setxattr(system.posix_acl_access))
> > > >
> > > > but the write is cached until close (which does a flush just before
> > > > the close) - so the final write resets the time which was set
> > > > correctly earlier
> > > >
> > > Interesting -- this seems to be "mtime" week for me. I've been working
> > > on some NFS cache invalidation issues that involve out-of-order mtime
> > > updates...
> > >
> > > Actually, this problem looks like a regression caused by the changes
> > > in this patch:
> > >
> > > commit cea218054ad277d6c126890213afde07b4eb1602
> > > Author: Jeff Layton <jlayton at redhat.com>
> > > Date: Tue Nov 20 23:19:03 2007 +0000
> > >
> > > [CIFS] Fix potential data corruption when writing out cached dirty pages
> > >
> > >
> > > ...before that patch we flushed the cache on every setattr call. We now
> > > just do that on size changes, but it sounds like we need to reconsider
> > > this.
> > >
> > > Trying to follow Lustre's model sounds tricky and I'm not sure it would
> > > work anyway since the server is going to update the mtime whenever the
> > > write comes in. Consider:
> > >
> > > Write --> server sets mtime to current time
> > > Setattr --> server sets mtime for utimes() call
> > > Write --> server resets mtime to current time
> > >
> > > ...I'm not sure that the client has much control over this. If we don't
> > > want to flush the cache for a setattr, there are two options:
> > >
> > > 1) delay the setattr until all of the pending writes have been flushed
> > > (tricky and not really what we want)
> > >
> > > 2) queue up another setattr call after the last write has completed
> > > (also tricky, and possibly problematic)
> > >
> > > There could be other problems lurking here too. What happens to the
> > > pending writes when we change the ownership and mode to something where
> > > we no longer have permission to write to the file? I suppose they just
> > > error out and that's not good either.
> > >
> > > For the record, NFS seems to flush the cache too on all setattr calls.
> > >
> > > Jeff Layton <jlayton at redhat.com>
> >
> > Here is a patch to fix the cp -p problem:
> >
> > Kukks/Jeff,
> > coments? Unless there are objections I want to add this to the tree
> > quickly (I also need to add Q's memleak fix - but want to try it
> > against more servers to see if there is a better way to do that)
> >
> > diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
> > index af42262..e57e5c4 100644
> > --- a/fs/cifs/inode.c
> > +++ b/fs/cifs/inode.c
> > @@ -1420,11 +1420,10 @@ int cifs_setattr(struct dentry *direntry,
> > struct iattr *attrs)
> > }
> > cifsInode = CIFS_I(direntry->d_inode);
> >
> > - /* BB check if we need to refresh inode from server now ? BB */
> > -
> > - if (attrs->ia_valid & ATTR_SIZE) {
> > + if ((attrs->ia_valid & ATTR_MTIME) || (attrs->ia_valid & ATTR_SIZE)) {
> > /*
> > - Flush data before changing file size on server. If the
> > + Flush data before changing file size or changing the last
> > + write time of the file on the server. If the
> > flush returns error, store it to report later and continue.
> > BB: This should be smarter. Why bother flushing pages that
> > will be truncated anyway? Also, should we error out here if
> > @@ -1435,7 +1434,9 @@ int cifs_setattr(struct dentry *direntry, struct
> > iattr *attrs)
> > CIFS_I(direntry->d_inode)->write_behind_rc = rc;
> > rc = 0;
> > }
> > + }
> >
> > + if (attrs->ia_valid & ATTR_SIZE) {
> > /* To avoid spurious oplock breaks from server, in the case of
> > inodes that we already have open, avoid doing path based
> > setting of file size if we can do it by handle.
> >
> >
> >
>
> That should fix Kukks' problem, but still leaves us vulnerable to the
> other problems. For instance:
>
> Changing UID/GID or mode could cause later writes to fail.
>
> What happens if we still have outstanding writes, but we've set
> ATTR_READONLY on the file on the server?
>
> What about the atime? Pending writes will clobber it.
>
> I think the only thing we can really do is flush on every setattr call.
> The only situation I can see where we don't need to flush is possibly
> where we're *only* changing the ctime, and it's probably not worth it
> to make that a special case.
>
> How about this patch instead?
>
> Signed-off-by: Jeff Layton <jlayton at redhat.com>
>
>
> diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
> index af42262..8bcbfff 100644
>
> --- a/fs/cifs/inode.c
> +++ b/fs/cifs/inode.c
> @@ -1420,22 +1420,20 @@ int cifs_setattr(struct dentry *direntry, struct iattr *attrs)
>
> }
> cifsInode = CIFS_I(direntry->d_inode);
>
> - /* BB check if we need to refresh inode from server now ? BB */
> + /*
> + Flush data before changing attributes on server. If the
>
> + flush returns error, store it to report later and continue.
> + BB: This should be smarter. Why bother flushing pages that
> + will be truncated anyway? Also, what should we do if the
> + the flush returns error?
> + */
> + rc = filemap_write_and_wait(direntry->d_inode->i_mapping);
> + if (rc != 0) {
> + CIFS_I(direntry->d_inode)->write_behind_rc = rc;
>
> + rc = 0;
> + }
>
> if (attrs->ia_valid & ATTR_SIZE) {
> - /*
>
> - Flush data before changing file size on server. If the
> - flush returns error, store it to report later and continue.
>
> - BB: This should be smarter. Why bother flushing pages that
> - will be truncated anyway? Also, should we error out here if
> - the flush returns error?
> - */
> - rc = filemap_write_and_wait(direntry->d_inode->i_mapping);
> - if (rc != 0) {
> - CIFS_I(direntry->d_inode)->write_behind_rc = rc;
> - rc = 0;
> - }
>
>
> -
> /* To avoid spurious oplock breaks from server, in the case of
> inodes that we already have open, avoid doing path based
> setting of file size if we can do it by handle.
>
--
Thanks,
Steve
More information about the linux-cifs-client
mailing list