Bugs in libsmbclient (was Re: Are code assertions considered harmful?)

Mon Nov 24 06:23:54 GMT 2003

I found it! So I thought I'd provide an update for anyone who was
interested in the original discussion.

"Richard Sharpe" <rsharpe at richardsharpe.com> wrote on 9 Nov 2003:
> On Sun, 9 Nov 2003, Cameron Paine wrote:

> > I have four failure modes; one we've discussed (you may recall I
> > forwarded you some session captures):
...

> > Silent death. The process simply dies. The frequency varies. By
> > carefully adding additional trace messages I'm generating logs
> > that are starting to point to the problem. I've narrowed it down
> > to clientgen.c:cli_send_smb(). This is a work in progress.

This one is caused by SIGPIPE iff we try to send() to a socket after
the network has "browned out".

This raises an interesting philosophical question: should the library
tinker with SIGPIPE? The (smbd/nmbd) server code deliberately
ignores SIGPIPE. However a library should probably leave signal-
handling unaltered. Wrapping the call to send() and its ilk with
signal manipulation calls may impose an unacceptable performance
overhead so we're stuck with leaving the disposition of SIGPIPE
the way the library's client expects it to be.

However, we can't push it too high up the call chain. The abstraction
presented by the library's interface does not, AFAICS, embrace
signals. Therefore, requiring that the client code explicitly set the
signal disposition to a state that's usable by the library probably
stretches the abstraction a little too much.

Perhaps the middle ground might be a flag that's passed to
smbc_init() that delegates the SIGPIPE handling to the library (or
not, as the case may be). If it's delegated, the library should ignore
SIGPIPE so that send() sets errno to EPIPE. If we do that without
fixing one of the other problems I mentioned previously...

> > Transport layer errors don't propagate upwards. When this one
> > bites I cannot continue transactions with the target server
> > unless I shut down the process. There seems to be some persistent,
> > per-server state that is maintained that I won't pretend to
> > understand.

...we create another transport-layer error that will fall into the above
hole.

I now understand what happens when a transport error occurs. A mid-
layer library service routine invalidates the socket descriptor in the
client state structure. However, because the value of errno is not
actioned by the higher level code, the whole server structure remains
in cache. On a subsequent call, the server structure for the session is
retrieved from cache but its client state now has an invalid socket fd.

Deadlock. The library interface provides no direct mechanism for
expunging the server structure from the cache (neither should it) so
the client is "stuck" with an inconsistent control structure that it
can do nothing about--except terminate the process as a means of
flushing the server cache.

> Please log these as well.
> 
> Regards
> -----
> Richard Sharpe, rsharpe[at]ns.aus.com, rsharpe[at]samba.org, 

I now feel confident that I can log them in a meaningful way. Expect to
see them on bugzilla in the next couple of days.

Cameron