[PATCH][SMB3] display stats counters for number of slow commands

Steve French smfrench at gmail.com
Mon Aug 6 21:56:21 UTC 2018


The main reason I separated these counters from the per-tree
connection ones was that it MUCH simplified the coding of them and
made it much less likely that it would introduce a bug/regression. It
also takes up less memory which is a minor but real benefit. In the
place where these requests are pulled off the wire and where timings
were already checked, we don't yet know which tree connection if any
(at least 4 of the command types don't have a tree connection) it is
associated with - but we do know which socket and thus which host
name.  In a perfect world we would also track read and write and
close/flush timings (maybe even open) and produce nice statistics as
Volker and others have suggested, but I wanted to model more closely
the read/write timings after iostat or better examples, and this set
of statistics I wanted as a very low risk set of counters that would
help diagnose various types of problems that I have been seeing with
reconnect - where we have to both understand why the reconnect
occurred (and whether the server has been showing other signs of
problems, other slow requests before the hang which triggered the
timeout and reconnect) but also debug the subsequent problems.   There
are so many reconnect scenarios, getting more objective data on what
is going on in a particular time period with slow responses is one
useful piece of data that will help.  I expect that the reconnect
problems I have been discussing with Pavel and others will lead to
four or five fixes - but in the short term I needed something
objective to more easily determine when the server is having problems.
On Mon, Aug 6, 2018 at 6:47 AM David Disseldorp <ddiss at samba.org> wrote:
>
> Hi Steve,
>
> On Sat, 4 Aug 2018 05:32:25 -0500, Steve French wrote:
>
> > When CONFIG_CIFS_STATS2 is enabled keep counters for slow
> > commands (ie server took longer than 1 second to respond)
> > by SMB2/SMB3 command code.  This can help in diagnosing
> > whether performance problems are on server (instead of
> > client) and which commands are causing the problem.
> >
> > Sample output (the new lines contain words "slow responses ...")
>
> Wouldn't putting these alongside the existing per-session op counts be
> more suitable, e.g.
> 1) \\192.168.1.1\rapido-share
> ...
> Creates: 1 total 0 failed 1 slow
> Closes: 0 total 0 failed 0 slow
> Flushes: 1 total 0 failed 1 slow
> Reads: 0 total 0 failed 0 slow
>
> It'd be helpful if this file included some sort of API version, so that
> parsers like PCP[1] knew what format to expect. Alternatively, moving
> to a configfs style format with one metric per file (similar to LIO)
> might be more useful and extensible, e.g.
> /sys/kernel/config/cifs/stats/sessions
> /sys/kernel/config/cifs/stats/shares
> ...
> /sys/kernel/config/cifs/stats/<session>/smbs
> /sys/kernel/config/cifs/stats/<session>/creates/total
> /sys/kernel/config/cifs/stats/<session>/creates/failed
> /sys/kernel/config/cifs/stats/<session>/creates/slow
> /sys/kernel/config/cifs/stats/<session>/reads/total
> /sys/kernel/config/cifs/stats/<session>/reads/failed
> /sys/kernel/config/cifs/stats/<session>/reads/slow
> ...
>
> Cheers, David
>
> 1. PCP cifs.ko monitoring agent
> https://github.com/performancecopilot/pcp/blob/master/src/pmdas/cifs/pmda.c



-- 
Thanks,

Steve



More information about the samba-technical mailing list