Question(s) about smbd in respect to preadv2

Tue Jan 27 09:44:53 MST 2015

Replying to my self with some additional numbers.

On Mon, Jan 26, 2015 at 10:58 AM, Milosz Tanski <milosz at adfin.com> wrote:
> On Sat, Jan 24, 2015 at 6:39 PM, Jeremy Allison <jra at samba.org> wrote:
>> On Sat, Jan 24, 2015 at 03:46:18PM -0500, Milosz Tanski wrote:
>>> I'm at a bit of a cross roads with testing preadv2 using samba and how
>>> samba could benefit from the syscall. I need to better understand the
>>> samba architecture today and where it's going. I spent a few hours
>>> yesterday and today better to understand samba; I also looked at
>>> Jermey's SambaXP 2014 talk. There's a few clarifications I need.
>>>
>>> I did some preliminary testing using preadv2 to perform. The first
>>> tests I ran were using the source3 smbd server. And I compared the
>>> sync, pthreadpool and pthreadpool + preadv2. Just a simple rand read
>>> test with one client with a cached data.
>>>
>>> The results were that sync was fastest (no surprise there).
>>> Pthreadpool + preadv2 was about 8% slower then sync. Plain old
>>> pthreadpool 26% slower. So not a bad win there. Additionally, it looks
>>> like the vfs_pread_send and vfs_pread_recv have a bit more overhead
>>> over plain old vfs_pread in the code path so it's possible to get that
>>> 8% even closer to the sync case.
>>>
>>> So far, so good... but what struct me is that I don't really
>>> understand why samba uses the pthreadpool if forks() for each client
>>> connection? Why bother.
>>
>> Because it allows a client to have multiple outstanding
>> read and write IOops to a single smbd daemon.
>>
>> This is important if a client has multiple processes
>> reading and writing multiple large files - the client
>> redirector just pipelines to the number of outstanding
>> SMB2 credits.
>>
>> Using pthreadpool then allows a single smbd to have
>> multiple preads/pwrites outstanding.
>
> I can only imagine that the latency distribution would shift left
> (lower) for fully cached / sequential reads as I've seen that in our
> app that developed preadv2 for originally.

I have some additional numbers for the single sync client test. This
test case is a 64k sequential read of a large file (8x page cache) of
a file on a spinning disk. I expected the the differences in this case
(bandwidth) to narrow between each on bandwidth... but it doesn't look
like the difference got noticeably smaller.

Here's a link to the full numbers: http://i.imgur.com/Lnxb4Lk.png

>
> I haven't figured out yet to make async client submit multiple SMB2
> async requests using the cifs FIO engine yet After spending more time
> reading the stuff in libcli, it looks like the code is there it just
> not exported in a external library.
>

-- 
Milosz Tanski
CTO
16 East 34th Street, 15th floor
New York, NY 10016

p: 646-253-9055
e: milosz at adfin.com