2.5 readpages doubles cifs vfs large file copy performance vs.
2.4
David Collier-Brown -- Customer Engineering
David.Collier-Brown at sun.com
Thu Jul 3 13:42:58 GMT 2003
The data I have from qfs and ufs on Solaris says bigger is better
until you get too big, but it's fairly insensitive once you get
past too small. The usual u-shaped curve, with a very wide
"sweet spot".
If the sweet spot on Linux is large, it's going to be easy. If
it's hard to hit, then I have to start looking carefully at the
criteria.
I think, and this is speculation ahead of doing the experiment, that
a "good" size is something like the least common multiple of
the number returned by QFSInfo
the number specified by a sysadmin in the conf file
and should dominate
the MTU
and possibly
the SMB buffer size
Observation with truss (the Solaris strace) showed Samba doing
a read of 1 KB, then 16 KB and finally of 64KB, which was
**pessimal** for our ufs (which is really an improved BSD ffs).
--dave
Steven French wrote:
>
> >Out of curiosity, do you expect to see a corresponding improvement
> >if you issue reads in suitable stripe sizes? Jeremy did a write cache
> >buffer for RAID disks
>
> That is an interesting question. I suspect that for the optimal single
> client
> case over GigE transport, increasing the MTU (from 1500) and Samba smb
> buffer size (from 16K), would be the most important immediately helpful
> thing to do as long as you match these network sizes
> sensibly to minimize fragmentation. Matching read sizes to better match
> stripe size seems intuitive but it is not clear what the client should
> use to
> determine this (can the client really trust the "bsize" that is returned
> to cifs
> vfs today via QFSInfo at mount time i.e. the equivalent of stat of the
> remote fs mounted share is issued? can the client expect that the server
> would
> set the negotiated buffer size properly - and can we assume that
> the server optimally expects us to use a read size of
>
> readsize = negotiated buffer size & 0xFFE0;
>
> So what should a cifs client use to determine:
> a) optimal read size & write size
> (today for readpages I use for the read size the
> largest read that fits in server's negotiated buffer size,
> and 4k is obviously used for writepage
> (I don't support writepages yet)
> b) optimal number of simultaneous requests
> This could be dynamically determined by the client
> (moving up or down based on response time for last
> read/write) or we could simply statically configure. In any
> case never send more than the maximum number of
> requests that the server returned on MaxMpxCount
> (I probably don't check this properly in the cifs
> vfs code today but it has turned out to be harmless
> so far)
>
> Submitting multiple client read/write requests shouldn't be hard but I
> have to be able to do some
> smp safe waiting on multiple events in my demultiplexing code
> (SendReceive in fs/cifs/transport.c
> and in the cifs demultiplex thread that is created for each connection
> to a distinct server). This
> avoids having to do the nasty optimizations that the nfs client does in
> its read ahead.
>
> Steve French
> Senior Software Engineer
> Linux Technology Center - IBM Austin
> phone: 512-838-2294
> email: sfrench at-sign us dot ibm dot com
>
> To: Steven French/Austin/IBM at IBMUS
> cc: samba-technical at samba.org, Andrew Morton <akpm at digeo.com>
> Subject: Re: 2.5 readpages doubles cifs vfs large file copy
> performance vs. 2.4
>
>
>
> Steven French wrote:
> > Since the disk subsystems are not really fast enough, the files on the
> > server side are cached to get results this good. Interestingly Samba
> > server itself (vs. nfsd) did not seem to be a big problem from the
> > performance perspective but rather the main difference for this kind of
> > test is just having the server not sitting idle waiting to receive a new
> > read request (unlike cifs client, nfs client issues more than one read
> > request at a time which helps there).
>
> Out of curiosity, do you expect to see a corresponding improvement
> if you issue reads in suitable stripe sizes? Jeremy did a write cache
> buffer for RAID disks which provides a substantial improvement
> in speed on a Solaris system, but said something which made me think
> that it wouldn't help as much on Linix. I just did a read cache
> and saw Solaris improvements, but haven't set up a Linux testbed.
> So I wonder where the bottlenechs are likely move to after the
> system can saturate a mongo ethernet (:-))
>
> --dave
> --
> David Collier-Brown, | Always do right. This will gratify
> Sun Microsystems DCMO | some people and astonish the rest.
> Toronto, Ontario |
> (905) 415-2849 or x52849 | davecb at canada.sun.com
>
>
--
David Collier-Brown, | Always do right. This will gratify
Sun Microsystems DCMO | some people and astonish the rest.
Toronto, Ontario |
(905) 415-2849 or x52849 | davecb at canada.sun.com
More information about the samba-technical
mailing list