Zero-copy patch

Jacky Lam jlam at
Mon Feb 7 23:53:14 MST 2011


        After tracing the kernel code, I come up with a disappointing conclusion.

        The heavy memory copy are coming from two places in kernel:

        1. net/core/skbuff.c: linear_to_page()
        2. fs/splice.c: pipe_to_file()

        The first one is actually introduced in to fix a data corruption bug when splice from socket to socket. I quote the kernel change log at the end of mail. This one seems don't bother Samba, so I roll back the fix and get a 10% improvement.

        The second one do memcopy when
                *       Destination page already exists in the address space and there
                *       are users of it. For that case we have no other option that
                *       copying the data. Tough luck.
        But I am curious who the users are (mistake?). So, I just comment out the memory copy and expect data corruption. I want to check what can I achieve if all these things are probably fixed.

        Finally, I get the throughput more or less the same as using read()/write(). It doesn't worthwhile to bother all those trouble at all. Maybe the data flow of current implementation of socket splice are inefficient enough indeed.

        If there is not any other idea, I think I would rather spend time on how to make the patches you send me to work.



commit 813fa24255a5de93ef3fc4c2efff3ee31a2545b6
Author: Jarek Poplawski <jarkao2 at>
Date:   Mon Jan 19 17:03:56 2009 -0800

    net: Fix data corruption when splicing from sockets.

    [ Upstream commit 8b9d3728977760f6bd1317c4420890f73695354e ]

    The trick in socket splicing where we try to convert the skb->data
    into a page based reference using virt_to_page() does not work so

    The idea is to pass the virt_to_page() reference via the pipe
    buffer, and refcount the buffer using a SKB reference.

    But if we are splicing from a socket to a socket (via sendpage)
    this doesn't work.

    The from side processing will grab the page (and SKB) references.
    The sendpage() calls will grab page references only, return, and
    then the from side processing completes and drops the SKB ref.

    The page based reference to skb->data is not enough to keep the
    kmalloc() buffer backing it from being reused.  Yet, that is
    all that the socket send side has at this point.

    This leads to data corruption if the skb->data buffer is reused
    by SLAB before the send side socket actually gets the TX packet
    out to the device.

    The fix employed here is to simply allocate a page and copy the
    skb->data bytes into that page.

    This will hurt performance, but there is no clear way to fix this
    properly without a copy at the present time, and it is important
    to get rid of the data corruption.

    With fixes from Herbert Xu.

    Tested-by: Willy Tarreau <w at>
    Foreseen-by: Changli Gao <xiaosuo at>
    Diagnosed-by: Willy Tarreau <w at>
    Reported-by: Willy Tarreau <w at>
    Fixed-by: Jens Axboe <jens.axboe at>
    Signed-off-by: Jarek Poplawski <jarkao2 at>
    Signed-off-by: David S. Miller <davem at>
    Signed-off-by: Greg Kroah-Hartman <gregkh at>

-----Original Message-----
From: Jeremy Allison [mailto:jra at]
Sent: Wednesday, February 02, 2011 1:58 AM
To: Jacky Lam
Cc: Volker.Lendecke at SerNet.DE; samba-technical at
Subject: Re: Zero-copy patch

On Tue, Feb 01, 2011 at 09:03:06AM -0500, Jacky Lam wrote:
> I am curious about that as well. But I have double check most of the time spending while writing to samba server is that two splice call. And the kernel profile is showing that __copy_user() is using 25% CPU time during that period of time.
> I don't know if it is platform dependent problem.....have you try to turn on splice and do a kernel profiling?

Please do and let us know why it isn't working right.
I can help bug the kernel devs to look at it.

FYI for everyone else, I've sent Jacky the kernel
patch to test off-list.


This message and any attached documents contain information from ViXS Systems, Inc. and are confidential and privileged and further subject to any confidentiality agreement between the parties. The information is intended to be viewed only by the individual(s) or entity(ies) to whom the message is addressed. If you are not the intended recipient, be aware that reading, disclosing, copying, distributing or using the contents of this transmission is prohibited. Please notify us immediately if you have received this transmission in error, and delete this message along with any attached files.

More information about the samba-technical mailing list