[Samba] Samba write performance in kernel

Fri Sep 12 09:42:53 GMT 2008

hi,

> Ok, this is different. I had missed that you are talking
> about a small device with slow memory bandwith. In that
> case, you might certainly gain something by avoiding the
> copies. If you are really memcpy-bound, you should
> definitely make splice work.

yes, but the question is by how much can it improve. We would like to see it reach 7Mbps in 2 or 3 months, can't just waiting for linux kernel fix the problem. Is there any individuals/company capable of doing such performance improvement for charge?

> At the high end, latencies is mostly what kills your
> performance. Mostly you have enough bandwidth, but if you
> just do a simple request->response scheme, you can't get
> beyond a certain overall bandwith that is way below the
> theoretical network bandwith. To fill that, you need to make
> the client issue parallel requests. Multi-threaded windows
> client apps can do it, smbclient from 3.2 does it.

I have tried multiple threads but it doesn't help the performance in my case. The delay caused by memory copy seems dominate.

Best Regards,
Mac Lin> Date: Fri, 12 Sep 2008 09:45:32 +0200
> From: Volker.Lendecke at SerNet.DE
> To: mkl23 at hotmail.com
> CC: jra at samba.org; samba at lists.samba.org
> Subject: Re: [Samba] Samba write performance in kernel
> 
> On Fri, Sep 12, 2008 at 02:43:25PM +0800, Lin Mac wrote:
> > well, in my condition, it might be 30% instead of 5%, IF
> > splice can cover advantage 1 and 2.
> > 1. I'm using a slow CPU (FA526) , and the memory copy is even slower. 
> > 2. The reading performance is over 7 MB/s, with mmap and
> > sendfile enabled, while writing is only 4-5 MB/s. Without
> > mmap and sendfile, reading from samba is also about 4-5
> > MB/s. 
> > 3. I used Oprofile to profile writing file to samba and
> > found that CPU takes over 30% CPU time on
> > copy_from/to_user, so I think going to user space and back
> > again is the bottleneck.
> > 4. My device is only 100Mbps Ethernet
> > 5. I uses Windows client to measure throughput
> 
> Ok, this is different. I had missed that you are talking
> about a small device with slow memory bandwith. In that
> case, you might certainly gain something by avoiding the
> copies. If you are really memcpy-bound, you should
> definitely make splice work.
> 
> > > here, but the network latencies together with non-optimally
> > > queued requests by the client have a MUCH greater influence.
> > 1. If splice works, can memory copy be avoided?
> > 2. Sorry I don't really get what the "non-optimally queued
> > requests" means. And what could I do to make it optimized?
> 
> At the high end, latencies is mostly what kills your
> performance. Mostly you have enough bandwidth, but if you
> just do a simple request->response scheme, you can't get
> beyond a certain overall bandwith that is way below the
> theoretical network bandwith. To fill that, you need to make
> the client issue parallel requests. Multi-threaded windows
> client apps can do it, smbclient from 3.2 does it.
> 
> Volker

_________________________________________________________________
Áo©ú·j´M©MÂsÄýºô¸ôªº§K¶O¤u¨ã¦C ¡X MSN ·j´M¤u¨ã¦C 
http://toolbar.live.com/