2.5 readpages doubles cifs vfs large file copy performance vs. 2.4

Wed Jul 2 16:39:12 GMT 2003

>Out of curiosity, do you expect to see a corresponding improvement
>if you issue reads in suitable stripe sizes?  Jeremy did a write cache
>buffer for RAID disks

That is an interesting question.   I suspect that for the optimal single 
client 
case over GigE transport, increasing the MTU (from 1500) and Samba smb 
buffer size (from 16K), would be the most important immediately helpful
thing to do as long as you match these network sizes 
sensibly to minimize fragmentation.   Matching read sizes to better match
stripe size seems intuitive but it is not clear what the client should use 
to 
determine this (can the client really trust the "bsize" that is returned 
to cifs
vfs today via QFSInfo at mount time i.e. the equivalent of stat of the 
remote fs mounted share is issued? can the client expect that the server 
would 
set the negotiated buffer size properly - and can we assume that
the server optimally expects us to use a read size of 

        readsize = negotiated buffer size & 0xFFE0;

So what should a cifs client use to determine:
        a) optimal read size & write size 
        (today for readpages I use for the read size the
        largest read that fits in server's negotiated buffer size,
        and 4k is obviously used for writepage 
        (I don't support writepages yet)
        b) optimal number of simultaneous requests
        This could be dynamically determined by the client 
        (moving up or down based on response time for last 
        read/write) or we could simply statically configure.  In any 
        case never send more than the maximum number of
        requests that the server returned on MaxMpxCount
        (I probably don't check this properly in the cifs
        vfs code today but it has turned out to be harmless
        so far)

Submitting multiple client read/write requests shouldn't be hard but I 
have to be able to do some 
smp safe waiting on multiple events in my demultiplexing code (SendReceive 
in fs/cifs/transport.c
and in the cifs demultiplex thread that is created for each connection to 
a distinct server). This
avoids having to do the nasty optimizations that the nfs client does in 
its read ahead.

Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at-sign us dot ibm dot com
To:     Steven French/Austin/IBM at IBMUS
cc:     samba-technical at samba.org, Andrew Morton <akpm at digeo.com> 
Subject:        Re: 2.5 readpages doubles cifs vfs large file copy 
performance vs. 2.4

Steven French wrote:
> Since the disk subsystems are not really fast enough, the files on the
> server side are cached to get results this good.   Interestingly Samba
> server itself (vs. nfsd) did not seem to be a big problem from the
> performance perspective but rather the main difference for this kind of
> test is just having the server not sitting idle waiting to receive a new
> read request (unlike cifs client, nfs client issues more than one read
> request at a time which helps there).

Out of curiosity, do you expect to see a corresponding improvement
if you issue reads in suitable stripe sizes?  Jeremy did a write cache
buffer for RAID disks which provides a substantial improvement
in speed on a Solaris system, but said something which made me think
that it wouldn't help as much on Linix.  I just did a read cache
and saw Solaris improvements, but haven't set up a Linux testbed.
So I wonder where the bottlenechs are likely move to after the
system can saturate a mongo ethernet (:-))

--dave
--
David Collier-Brown,           | Always do right. This will gratify
Sun Microsystems DCMO          | some people and astonish the rest.
Toronto, Ontario               |
(905) 415-2849 or x52849       | davecb at canada.sun.com