jlam at vixs.com
Mon Feb 7 23:53:14 MST 2011
After tracing the kernel code, I come up with a disappointing conclusion.
The heavy memory copy are coming from two places in kernel:
1. net/core/skbuff.c: linear_to_page()
2. fs/splice.c: pipe_to_file()
The first one is actually introduced in 126.96.36.199/2.6.29 to fix a data corruption bug when splice from socket to socket. I quote the kernel change log at the end of mail. This one seems don't bother Samba, so I roll back the fix and get a 10% improvement.
The second one do memcopy when
* Destination page already exists in the address space and there
* are users of it. For that case we have no other option that
* copying the data. Tough luck.
But I am curious who the users are (mistake?). So, I just comment out the memory copy and expect data corruption. I want to check what can I achieve if all these things are probably fixed.
Finally, I get the throughput more or less the same as using read()/write(). It doesn't worthwhile to bother all those trouble at all. Maybe the data flow of current implementation of socket splice are inefficient enough indeed.
If there is not any other idea, I think I would rather spend time on how to make the patches you send me to work.
Author: Jarek Poplawski <jarkao2 at gmail.com>
Date: Mon Jan 19 17:03:56 2009 -0800
net: Fix data corruption when splicing from sockets.
[ Upstream commit 8b9d3728977760f6bd1317c4420890f73695354e ]
The trick in socket splicing where we try to convert the skb->data
into a page based reference using virt_to_page() does not work so
The idea is to pass the virt_to_page() reference via the pipe
buffer, and refcount the buffer using a SKB reference.
But if we are splicing from a socket to a socket (via sendpage)
this doesn't work.
The from side processing will grab the page (and SKB) references.
The sendpage() calls will grab page references only, return, and
then the from side processing completes and drops the SKB ref.
The page based reference to skb->data is not enough to keep the
kmalloc() buffer backing it from being reused. Yet, that is
all that the socket send side has at this point.
This leads to data corruption if the skb->data buffer is reused
by SLAB before the send side socket actually gets the TX packet
out to the device.
The fix employed here is to simply allocate a page and copy the
skb->data bytes into that page.
This will hurt performance, but there is no clear way to fix this
properly without a copy at the present time, and it is important
to get rid of the data corruption.
With fixes from Herbert Xu.
Tested-by: Willy Tarreau <w at 1wt.eu>
Foreseen-by: Changli Gao <xiaosuo at gmail.com>
Diagnosed-by: Willy Tarreau <w at 1wt.eu>
Reported-by: Willy Tarreau <w at 1wt.eu>
Fixed-by: Jens Axboe <jens.axboe at oracle.com>
Signed-off-by: Jarek Poplawski <jarkao2 at gmail.com>
Signed-off-by: David S. Miller <davem at davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh at suse.de>
From: Jeremy Allison [mailto:jra at samba.org]
Sent: Wednesday, February 02, 2011 1:58 AM
To: Jacky Lam
Cc: Volker.Lendecke at SerNet.DE; samba-technical at lists.samba.org
Subject: Re: Zero-copy patch
On Tue, Feb 01, 2011 at 09:03:06AM -0500, Jacky Lam wrote:
> I am curious about that as well. But I have double check most of the time spending while writing to samba server is that two splice call. And the kernel profile is showing that __copy_user() is using 25% CPU time during that period of time.
> I don't know if it is platform dependent problem.....have you try to turn on splice and do a kernel profiling?
Please do and let us know why it isn't working right.
I can help bug the kernel devs to look at it.
FYI for everyone else, I've sent Jacky the kernel
patch to test off-list.
IMPORTANT CONFIDENTIALITY NOTICE
This message and any attached documents contain information from ViXS Systems, Inc. and are confidential and privileged and further subject to any confidentiality agreement between the parties. The information is intended to be viewed only by the individual(s) or entity(ies) to whom the message is addressed. If you are not the intended recipient, be aware that reading, disclosing, copying, distributing or using the contents of this transmission is prohibited. Please notify us immediately if you have received this transmission in error, and delete this message along with any attached files.
More information about the samba-technical