[linux-cifs-client] Re: set last write time = fsync ?

Jeff Layton jlayton at redhat.com
Fri Mar 14 10:40:17 GMT 2008


On Thu, 13 Mar 2008 21:32:32 -0500
"Steve French" <smfrench at gmail.com> wrote:

> On Thu, Mar 13, 2008 at 6:10 AM, Jeff Layton <jlayton at redhat.com> wrote:
> >
> > On Wed, 12 Mar 2008 22:54:50 -0500
> >  "Steve French" <smfrench at gmail.com> wrote:
> >
> >  > kukks noticed that "cp -p source-file /cifs-mnt" would not set the
> >  > last write time to the previous time
> >  >
> >  > It looks like a straightforward caching sideffect - "cp -p" does
> >  >
> >  > open
> >  > write
> >  > set time stamps (utimensat)
> >  > set uid/gid (fchown32)
> >  > set acl (setxattr(system.posix_acl_access))
> >  >
> >  > but the write is cached until close (which does a flush just before
> >  > the close) - so the final write resets the time which was set
> >  > correctly earlier
> >  >
> >  Interesting -- this seems to be "mtime" week for me. I've been working
> >  on some NFS cache invalidation issues that involve out-of-order mtime
> >  updates...
> >
> >  Actually, this problem looks like a regression caused by the changes
> >  in this patch:
> >
> >  commit cea218054ad277d6c126890213afde07b4eb1602
> >  Author: Jeff Layton <jlayton at redhat.com>
> >  Date:   Tue Nov 20 23:19:03 2007 +0000
> >
> >     [CIFS] Fix potential data corruption when writing out cached dirty pages
> >
> >
> >  ...before that patch we flushed the cache on every setattr call. We now
> >  just do that on size changes, but it sounds like we need to reconsider
> >  this.
> >
> >  Trying to follow Lustre's model sounds tricky and I'm not sure it would
> >  work anyway since the server is going to update the mtime whenever the
> >  write comes in. Consider:
> >
> >  Write --> server sets mtime to current time
> >  Setattr --> server sets mtime for utimes() call
> >  Write --> server resets mtime to current time
> >
> >  ...I'm not sure that the client has much control over this. If we don't
> >  want to flush the cache for a setattr, there are two options:
> >
> >  1) delay the setattr until all of the pending writes have been flushed
> >  (tricky and not really what we want)
> >
> >  2) queue up another setattr call after the last write has completed
> >  (also tricky, and possibly problematic)
> >
> >  There could be other problems lurking here too. What happens to the
> >  pending writes when we change the ownership and mode to something where
> >  we no longer have permission to write to the file? I suppose they just
> >  error out and that's not good either.
> >
> >  For the record, NFS seems to flush the cache too on all setattr calls.
> >
> >  Jeff Layton <jlayton at redhat.com>
> 
> Here is a patch to fix the cp -p problem:
> 
> Kukks/Jeff,
> coments?  Unless there are objections I want to add this to the tree
> quickly (I also need to add Q's memleak fix - but want to try it
> against more servers to see if there is a better way to do that)
> 
> diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
> index af42262..e57e5c4 100644
> --- a/fs/cifs/inode.c
> +++ b/fs/cifs/inode.c
> @@ -1420,11 +1420,10 @@ int cifs_setattr(struct dentry *direntry,
> struct iattr *attrs)
>  	}
>  	cifsInode = CIFS_I(direntry->d_inode);
> 
> -	/* BB check if we need to refresh inode from server now ? BB */
> -
> -	if (attrs->ia_valid & ATTR_SIZE) {
> +	if ((attrs->ia_valid & ATTR_MTIME) || (attrs->ia_valid & ATTR_SIZE)) {
>  		/*
> -		   Flush data before changing file size on server. If the
> +		   Flush data before changing file size or changing the last
> +		   write time of the file on the server. If the
>  		   flush returns error, store it to report later and continue.
>  		   BB: This should be smarter. Why bother flushing pages that
>  		   will be truncated anyway? Also, should we error out here if
> @@ -1435,7 +1434,9 @@ int cifs_setattr(struct dentry *direntry, struct
> iattr *attrs)
>  			CIFS_I(direntry->d_inode)->write_behind_rc = rc;
>  			rc = 0;
>  		}
> +	}
> 
> +	if (attrs->ia_valid & ATTR_SIZE) {
>  		/* To avoid spurious oplock breaks from server, in the case of
>  		   inodes that we already have open, avoid doing path based
>  		   setting of file size if we can do it by handle.
> 
> 
> 

That should fix Kukks' problem, but still leaves us vulnerable to the
other problems. For instance:

Changing UID/GID or mode could cause later writes to fail.

What happens if we still have outstanding writes, but we've set
ATTR_READONLY on the file on the server?

What about the atime? Pending writes will clobber it.

I think the only thing we can really do is flush on every setattr call.
The only situation I can see where we don't need to flush is possibly
where we're *only* changing the ctime, and it's probably not worth it
to make that a special case.

How about this patch instead?

Signed-off-by: Jeff Layton <jlayton at redhat.com>

diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index af42262..8bcbfff 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1420,22 +1420,20 @@ int cifs_setattr(struct dentry *direntry, struct iattr *attrs)
 	}
 	cifsInode = CIFS_I(direntry->d_inode);
 
-	/* BB check if we need to refresh inode from server now ? BB */
+	/*
+	   Flush data before changing attributes on server. If the
+	   flush returns error, store it to report later and continue.
+	   BB: This should be smarter. Why bother flushing pages that
+	   will be truncated anyway? Also, what should we do if the
+	   the flush returns error?
+	 */
+	rc = filemap_write_and_wait(direntry->d_inode->i_mapping);
+	if (rc != 0) {
+		CIFS_I(direntry->d_inode)->write_behind_rc = rc;
+		rc = 0;
+	}
 
 	if (attrs->ia_valid & ATTR_SIZE) {
-		/*
-		   Flush data before changing file size on server. If the
-		   flush returns error, store it to report later and continue.
-		   BB: This should be smarter. Why bother flushing pages that
-		   will be truncated anyway? Also, should we error out here if
-		   the flush returns error?
-		 */
-		rc = filemap_write_and_wait(direntry->d_inode->i_mapping);
-		if (rc != 0) {
-			CIFS_I(direntry->d_inode)->write_behind_rc = rc;
-			rc = 0;
-		}
-
 		/* To avoid spurious oplock breaks from server, in the case of
 		   inodes that we already have open, avoid doing path based
 		   setting of file size if we can do it by handle.


More information about the linux-cifs-client mailing list