Using sendfile for reducing CPU utilization

Richard Sharpe rsharpe at ns.aus.com
Thu Aug 8 13:09:02 GMT 2002


On Thu, 8 Aug 2002, Ephi Dror wrote:

> >> (And is there a recvfile?? Never heard of that. Maybe BSD only?)
>  
> Hi all,
>  
> Isn't special hardware needed to implement recvfile which will zerocopy
> data from a socket (incoming data) to a file descriptor (File system
> buffer cache)?

Well, not really.

There are several issues here.

It turns out that if you can avoid moving the data between userland and 
kernel, there are big wins, and you don't *need* zero copy.

The problem is that you have to analyse what is going on properly to 
handle the situation carefully.

In the case of reads by an SMB client (read and X), the issue is to avoid 
reading all the data from the file into userland and then passing it back 
down to the kernel to send out a socket.

In the case of writes (write and X) you have to modify Samba in some 
serious ways to make it work, even for zero copy.

Added to that is the fact that zero copy really only works well with jumbo 
frames, but you wont always have jumbo frames in force ...

While using sendfile/recvfile (if implemented) will always work to reduce 
the CPU penalty involved in both directions. 
  
> Just for education, the way zerocopy is done for sendfile is that fs
> buffers that hold part of your file are linked as what it called
> external mbufs and are given to the stack for sending. If you reuse
> those buffers before tcp engine was successfully done sending them and
> callback to sf_buf_free and tell you that the mbuff can be freed, you
> screwed. 

That is a FreeBSD specific case. Linux does something slightly different 
with its skbufs, but no doubt it all comes down to fiddling page tables.

> The receive side however is more tricky to implement in the kernel and
> requires file system and stack modifications and of course special
> hardware to enable this great idea in the first place.

Header splitting all the way up to the SMB Write&X header would be 
required to get the most out of zero copy, but I would be interested to 
see the improvement from just doing recvfile in FreeBSD (and will be doing 
that soon), and indeed the patches to Samba will be the same for Linux's 
symmetric sendfile impl as they would be for a recvfile in FreeBSD.

It may be that the engineering effort required to make zero copy work does 
not win as big as simply doing the sendfile/recvfile changes.

Recent tests I was involved with looking at Samba and Linux and GigE 
suggest that the platform we were testing can go close to saturating GigE 
on writes, so reducing CPU utilization will simply mean that the platform 
can handle more GigE links rather than improving throughput on a single 
link.

> Am I missing big time here?
>  
> If there is a public domain generic implementation of recvfile, I would
> love to see it.
>  
> One more point, in freebsd, sendfile's args are fd and s, in LINUX
> however, the args are out_fd and in_fd and they just telling you that
> for now, the in_fd must be an open file and the out_fd must be a socket.
> This means that in the future if LINUX supports the other way around,
> they can stay with the same system call which will allow each fd to be
> either an fd or a socket.
>  
> In regards to adding sendfile support to SAMBA, I think it is pretty
> safe and can be easily done. The only issue to take care is the
> differences between the OSs in regards to sending the SMB header.
>  
> Thanks,
> Ephi
>  
>  
>  
>  
>  
>  
> 

-- 
Regards
-----
Richard Sharpe, rsharpe at ns.aus.com, rsharpe at samba.org, 
sharpe at ethereal.com




More information about the samba-technical mailing list