2.5 readpages doubles cifs vfs large file copy performance vs. 2.4

Thu Jul 3 13:42:58 GMT 2003

   The data I have from qfs and ufs on Solaris says bigger is better
until you get too big, but it's fairly insensitive once you get
past too small.  The usual u-shaped curve, with a very wide
"sweet spot".

   If the sweet spot on Linux is large, it's going to be easy. If
it's hard to hit, then I have to start looking carefully at the
criteria.

   I think, and this is speculation ahead of doing the experiment, that
a "good" size is something like the least common multiple of
	the number returned by QFSInfo
	the number specified by a sysadmin in the conf file
and should dominate
	the MTU
and possibly
	the SMB buffer size

   Observation with truss (the Solaris strace) showed Samba doing
a read of 1 KB, then 16 KB and finally of  64KB, which was
**pessimal** for our ufs (which is really an improved BSD ffs).

--dave

Steven French wrote:
> 
>  >Out of curiosity, do you expect to see a corresponding improvement
>  >if you issue reads in suitable stripe sizes?  Jeremy did a write cache
>  >buffer for RAID disks
> 
> That is an interesting question.   I suspect that for the optimal single 
> client
> case over GigE transport, increasing the MTU (from 1500) and Samba smb
> buffer size (from 16K), would be the most important immediately helpful
> thing to do as long as you match these network sizes
> sensibly to minimize fragmentation.   Matching read sizes to better match
> stripe size seems intuitive but it is not clear what the client should 
> use to
> determine this (can the client really trust the "bsize" that is returned 
> to cifs
> vfs today via QFSInfo at mount time i.e. the equivalent of stat of the
> remote fs mounted share is issued? can the client expect that the server 
> would
> set the negotiated buffer size properly - and can we assume that
> the server optimally expects us to use a read size of
> 
>         readsize = negotiated buffer size & 0xFFE0;
> 
> So what should a cifs client use to determine:
>         a) optimal read size & write size
>         (today for readpages I use for the read size the
>         largest read that fits in server's negotiated buffer size,
>         and 4k is obviously used for writepage
>         (I don't support writepages yet)
>         b) optimal number of simultaneous requests
>         This could be dynamically determined by the client
>         (moving up or down based on response time for last
>         read/write) or we could simply statically configure.  In any
>         case never send more than the maximum number of
>         requests that the server returned on MaxMpxCount
>         (I probably don't check this properly in the cifs
>         vfs code today but it has turned out to be harmless
>         so far)
> 
> Submitting multiple client read/write requests shouldn't be hard but I 
> have to be able to do some
> smp safe waiting on multiple events in my demultiplexing code 
> (SendReceive in fs/cifs/transport.c
> and in the cifs demultiplex thread that is created for each connection 
> to a distinct server). This
> avoids having to do the nasty optimizations that the nfs client does in 
> its read ahead.
> 
> Steve French
> Senior Software Engineer
> Linux Technology Center - IBM Austin
> phone: 512-838-2294
> email: sfrench at-sign us dot ibm dot com
> 
> To:        Steven French/Austin/IBM at IBMUS
> cc:        samba-technical at samba.org, Andrew Morton <akpm at digeo.com>
> Subject:        Re: 2.5 readpages doubles cifs vfs large file copy 
> performance vs. 2.4
> 
> 
> 
> Steven French wrote:
>  > Since the disk subsystems are not really fast enough, the files on the
>  > server side are cached to get results this good.   Interestingly Samba
>  > server itself (vs. nfsd) did not seem to be a big problem from the
>  > performance perspective but rather the main difference for this kind of
>  > test is just having the server not sitting idle waiting to receive a new
>  > read request (unlike cifs client, nfs client issues more than one read
>  > request at a time which helps there).
> 
> Out of curiosity, do you expect to see a corresponding improvement
> if you issue reads in suitable stripe sizes?  Jeremy did a write cache
> buffer for RAID disks which provides a substantial improvement
> in speed on a Solaris system, but said something which made me think
> that it wouldn't help as much on Linix.  I just did a read cache
> and saw Solaris improvements, but haven't set up a Linux testbed.
> So I wonder where the bottlenechs are likely move to after the
> system can saturate a mongo ethernet (:-))
> 
> --dave
> --
> David Collier-Brown,           | Always do right. This will gratify
> Sun Microsystems DCMO          | some people and astonish the rest.
> Toronto, Ontario               |
> (905) 415-2849 or x52849       | davecb at canada.sun.com
> 
> 

-- 
David Collier-Brown,           | Always do right. This will gratify
Sun Microsystems DCMO          | some people and astonish the rest.
Toronto, Ontario               |
(905) 415-2849 or x52849       | davecb at canada.sun.com