[linux-cifs-client] Re: [PATCH] do not attempt to close cifs files which are already closed due to session reconnect

Thu Nov 20 16:43:24 GMT 2008

On Thu, Nov 20, 2008 at 8:39 AM, Jeff Layton <jlayton at redhat.com> wrote:
> On Thu, 20 Nov 2008 08:04:08 -0600
> "Steve French" <smfrench at gmail.com> wrote:
>
>> On Thu, Nov 20, 2008 at 7:02 AM, Jeff Layton <jlayton at redhat.com> wrote:
>> > On Wed, 19 Nov 2008 23:24:47 -0600
>> > "Steve French" <smfrench at gmail.com> wrote:
>> >
>> >> On Wed, Nov 19, 2008 at 6:04 AM, Jeff Layton <jlayton at redhat.com> wrote:
>> >> > On Tue, 18 Nov 2008 21:46:59 -0600
>> >> > "Steve French" <smfrench at gmail.com> wrote:
>> >> >
>> >> >> In hunting down why we could get EBADF returned on close in some cases
>> >> >> after reconnect, I found out that cifs_close was checking to see if
>> >> >> the share (mounted server export) was valid (didn't need reconnect due
>> >> >> to session crash/timeout) but we weren't checking if the handle was
>> >> >> valid (ie the share was reconnected, but the file handle was not
>> >> >> reopened yet).  It also adds some locking around the updates/checks of
>> >> >> the cifs_file->invalidHandle flag
>> >> >>
>> >>
>> >> >
>> >> > Do we need a lock around this check for invalidHandle? Could this race
>> >> > with mark_open_files_invalid()?
>> >> The attached patch may reduce the window of opportunity for the
>> >> race you describe.   Do you think we need another flag?  (one
>> >> to keep requests other than a write retry from using this
>> >> handle, and one to prevent reopen when the handle is about to be closed
>> >> after we have given up on write retries getting through?
>> >>
>> >
>> >
>> > So that I make sure I understand the problem...
>> >
>> > We have a file that is getting ready to be closed (closePend is set),
>> > but the tcon has been reconnected and the filehandle is now invalid.
>> > You only want to reopen the file in order to flush data out of the
>> > cache, but only if there are actually dirty pages to be flushed.
>> I don't think we have to worry about normal case of flushing dirty pages, that
>> happens already before we get to cifs_close (fput calls flush/fsync).
>> The case I was thinking about was a write on this handle that
>> has hung, reconnected, and we are waiting for this pending write to complete.
>>
>> > If closePend is set then the userspace filehandle is already dead? No
>> > further pages can be dirtied, right?
>> They could be dirtied from other handles, and writepages picks
>> the first handle that it can since writepages does not
>> specify which handle to use (writepages won't pick a handle that
>> that is close pending and it may be ok on retry because we look
>> for a valid handle each time we retry so shouldn't pick this one)
>>
>
> Right, I was assuming that the inode has no other open filehandles...
>
> Even if there are any other open filehandles though, we still want to
> flush whatever dirty pages we have, correct? Or at least start
> writeback on them...
>
>> > Rather than a new flag, I suggest checking for whether there are dirty
>> > pages attached to the inode. If so, then we'll want to reopen the file
>> > and flush it before finally closing it.
>> There shouldn't be dirty pages if this is the last handle on the inode
>> being closed
>>
>
> At the time that the "release" op is called (which is cifs_close in
> this case), there may still be dirty pages, even if this is the last
> filehandle, right?
I don't see how we could have dirty pages on that inode,
filemap_fdatawrite was called (by cifs_flush) before we got to release
and writes on different handles would not have oplock (if there are
any other handles) and we would call filemap_fdatawrite on each of
those (non-cached) writes on another handle.

> If so then it seems reasonable to just check to see if there are any
> dirty pages, reopen the file and start writeback if so.
>
> Alternately, I suppose we could consider skipping the reopen/writeback
> if there are other open filehandles for the inode. The idea would be
> that we could assume that the pages would get flushed when the last fh
> is closed. I'm not sure if this violates any close-to-open attribute
> semantics though.
I don't think it matters much.  We only have the write pending flag
when we are actually using the file handle (find_writable_file
increments it) for write ... if we failed timing out on write to that
handle we would use a different handle or fail.

-- 
Thanks,

Steve