[Samba] Re: CTDB + Samba: Tune Read Performance

tim clusters tim.clusters at gmail.com
Fri Jan 30 21:34:27 GMT 2009


On Thu, Jan 29, 2009 at 10:45 PM, John H Terpstra <jht at samba.org> wrote:

> On Thursday 29 January 2009 21:40:55 tim clusters wrote:
> > On Tue, Jan 27, 2009 at 6:30 PM, tim clusters <tim.clusters at gmail.com
> >wrote:
> > > Hi,
> > >
> > > I have a two server setup that acts as SMB as well as NFS servers in
> > > active/active configuration managed by CTDB(http://ctdb.samba.org/).
> > >
> > > The write performance is around 100MB/s per client however the read
> > > performance is only 0.6MB/s (using Iozone benchmark). I use Windows
> 2003
> > > Server as CIFS client. Sometimes the read performance is good only from
> > > one of the CTDB managed Samba servers but not consistent when you
> restart
> > > CTDB + Samba.
> >
> > The issue is resolved and was network related. Tcpdump revealed lots of
> > retransmission from the server to client owing to improper TcpWindowSize
> > value.
> >
> > Cheers,
> > -Tim
>
> Tim,
>
> Thanks for reporting that back to the list.  This is useful information for
> others.  Would it be possible to perhaps provide a little more detail?
>

I apologize for being too terse. I myself need to narrow the right settings
for SO_RCVBUF,SO_SNDBUF and TCP/IP settings to get max bandwidth. Initially,
I had set SO_RCVBUF and SO_SNBUF to 262144 (larger packet size, more
performance)

[pid 29734] setsockopt(32, SOL_SOCKET, SO_RCVBUF, [262144], 4) = 0^M
[pid 29734] setsockopt(32, SOL_SOCKET, SO_SNDBUF, [262144], 4) = 0^M

Strace of SMBD revealed the server doing sendfile in chunk of 64KB from disk
file to socket.

[pid 29848] sendfile(32, 38, [3207168], 61440) = 61440
[pid 29848] sendfile(32, 38, [3268608], 61440) = 61440
[pid 29848] sendfile(32, 38, [3330048], 61440) = 61440

So, the server was doing as expected but still the performance was poor and
network trace revealed lots of retransmission only from the server to the
client (not the other way around).

9.990078 192.168.97.5 -> 192.168.97.1 SMB [TCP Retransmission] Read AndX
Response, 61440 bytes
10.322077 192.168.97.5 -> 192.168.97.1 SMB [TCP Retransmission] Read AndX
Response, 61440 bytes

Then I set the SO_RCVBUF and SO_SNDBUF to 65536 to align to sendfile size.
Still retransmissions was being seen. Googling, the primary suspect pointed
to TCP/IP stack in particular the TCP/IP window size.

TCP/IP Window Size = Bandwidth * RTT

The Windows machine has Myrinet 10GigE HCA while Linux server has Chelsio
10GigE HCA.

For 64KB SMB packet-size, Network testing led me to the following
conclusion:
Myrinet 10GigE: TCP Window Size = 3Gbps * 300 microsec ==> 150KB
Chelsio 10GigE: TCP Window Size = 3.7Gbps * 260 microsec ==> 120KB

 Myricom recommends TCP/IP windows size of 512KB for Windows, while on Linux
the window-size was set to 87.3KB (75% of 120KB to account for small
packets?).

net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 87380 16777216

As a results during read operation, the  amount of unacknowledged data in
flight that the server sent did not cause client to respond (as its window
size was 512KB) causing the server to retransmit after timeout (not
receiving acknowledgement). Also, TCP Window Scaling (RFC 1323) was not
enabled on Windows client. Setting the Windows TCP/IP Windows size to 87.3KB
(similar to Server) + TCP_1323Opts resolved the issue.

Currently, a SMB server is able to handle sustained 300MB/s on writes and
200MB/s on reads. Performance remains constant as you scale clients with no
time-outs and performance scales as you add another server. Iam still not
sure if we can extract more from SMBD as CPU/memory/IO subsystem is less
than 30% saturated. Seems like the performance bottleneck is network-related
+ SMB packet-size as raw network yields 450MB/s for 64KB packet-size.

I may be wrong, but this is the closest explanation I can come with. Please
suggest if there is room for further performance improvements.

[snip] of smb.conf
        socket options = IPTOS_LOWDELAY TCP_NODELAY SO_RCVBUF=65536
SO_SNDBUF=65536 SO_KEEPALIVE
        use mmap = No
        use sendfile = Yes
        blocking locks = No

Regards,
-Tim


More information about the samba mailing list