[linux-cifs-client] Re: writethrough of mmapped pages

Sat Jan 31 23:15:39 GMT 2004

 > You mean that you want the filesytem to know whenever an application
 > attempts to modify a mmapped page?  After the modification, rather than
 > before?

That would be helpful.   But alternatively if a simple check for dirty 
pages for an inode
is cheap (ie would not hurt performance to be called every few seconds) 
- then when
the read and read/write caching tokens for an inode (actually file 
handle but in this case
inode is close enough) are lost (oplock broken)  I might be able to have 
a kernel thread
simply sync all of the filesystem's dirty inodes which are not cacheable.

Unfortunately - an experiment I tried yielded very strange results and 
made things much worse.
First I disabled caching in my code by setting 
/proc/fs/cifs/OplockEnabled to zero (simulating losing
the caching token) and ran the fsx file system exerciser test case which 
almost immediately failed.

The fsx test case did the following steps before failing:
1) create new file
2) truncate up to 0x13E76
3) write to range 0x17098 - 0x26857
4) read 0xC73E through 0x1B801
5) check the data just read which turned up incorrect random data 
starting at offset 0x13000

What I see in my code (from the cifs vfs perspective) is basically :
1) create/open
2) setattrs (set file size)
3) write 0x17098 - 0x26857
4) but during the read (since I did not have a caching token and 
therefore was not calling
generic_read in my read routine) I tried the experiment of calling 
filemap_fdatawrite (to
handle the case in which mmapped data might exist locallly for this file 
but not be written to the
server for the range I was about to read from the server) before issuing 
the read call.  This is
a disaster because it causes a write of two (!) dirty pages to the 
server (there should not
be any).  
4) the filemap_fdatawrite caused writepage of one page at 0x0000 
(fortunately all zero as would be expected)
but ... and this is the problem ... it then it did a writepage of  one 
page at 0x13000 (note that this range should be zero
from the truncate (truncate_up to 0x13E76) and the rest should be zero 
implicitly from the write beyond end of file.
In any case - there never really was a write to this page (at 0x13000) 
and the server copy of the data was fine before this write which writes 
junk - locally the
page cache thinks this page is dirty and the write of a bogus dirty page 
out is bad.
5) then I do the read from the server which returns the data which is 
fine from 0xC73E but includes the junk starting at offset 0x1B00 (and 
then the testcase fails)

The logic inside the read routine originally had been to just call 
generic_file_read and the write routine just called generic_file_write 
(and all single machine remote cases  worked fine that way for many 
months) but the change I made to fix the oplock problem changed the read 
wrapper routine to do:

     if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead)
         return generic_file_read(file,read_data,read_size,poffset);
     else /* this calls the server directly to get the data and does not 
go through the pagecache */ {
         /* to get mmap data to get flushed in the case in which the 
caching token lost
        and we are not going through the pagecache I tried adding a call to
        filemap_fdatawrite here which caused the problems described in 
the note */
         return cifs_read(file,read_data,read_size,poffset);  
    }

and similarly in my write wrapper routine:

    /* check whether we can cache writes locally */
    /* if not then check whether we can use the page cache at all */
    if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll) 
         return generic_file_write(file,write_data, write_size,poffset);

    else if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead) {
        written = generic_file_write(file,write_data, write_size,poffset);
        /* since we can not cache writes, need to sync the data */
        if(file->f_dentry->d_inode->i_mapping == NULL)
            return -EIO;
        filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
            return written;
    } else { /* this does not go through the page cache but goes 
directly to the server */
         return cifs_write(file,write_data,write_size,poffset);
    }

It seems that I am going to always have to go through the pagecache it 
seems (remove the call to cifs_write and cifs_read and always call the 
generic_file_read and generic_file_write), perhaps the next thing to try 
is: this for read:

     if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheRead)
         return generic_file_read(file,read_data,read_size,poffset);
     else {
        if(file->f_dentry->d_inode->i_mapping)
            filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);      
         invalidate_inode_pages(file->f_dentry->d_inode);
         return generic_file_read(file,read_data,read_size,poffset);
    }

and this for write:
    /* check whether we can cache writes locally */
    /* if not then check whether we can use the page cache at all */
    if(CIFS_I(file->f_dentry->d_inode)->clientCanCacheAll) 
         return generic_file_write(file,write_data, write_size,poffset);
    else {
        written = generic_file_write(file,write_data, write_size,poffset);
        /* since we can not cache writes, need to sync the data */
        if(file->f_dentry->d_inode->i_mapping)
            filemap_fdatawrite(file->f_dentry->d_inode->i_mapping);
       return written;
    }

Are there better ways to do this?