TALLOC: Discussion of destructor processing

Tue May 1 06:13:43 MDT 2012

On Tue, 2012-05-01 at 08:00 -0400, simo wrote:
> On Tue, 2012-05-01 at 07:31 -0400, Stephen Gallagher wrote: 
...
> > Perhaps I was putting the cart before the horse with my initial
> > comments. I suppose my first question really should be: "Why do we allow
> > destructors to fail?". In the current code, there is really no sane way
> > to deal with a failed destructor at all. It might make sense then to
> > just make the decision to turn destructors into void functions and
> > ALWAYS free memory after they've completed.
> > 
> > The real problem with destructors on nested contexts is that there's no
> > way to report back whether any or all of the children failed their
> > destructors. Depending on only the top-level context failing to return
> > -1 to the caller is misleading at best.
> > 
> > I think it would be a better design pattern to remove the return value
> > of destructors entirely and always assume it completed successfully.
> > This will encourage people first of all to build their destructors such
> > that it should be impossible to fail them.
> > 
> > Can someone demonstrate an example for me of a case where handling
> > destructor failure is actually relied upon in samba?
> 
> My memory is quite bad, but I seem to remember some places where we may
> rely on the destructor to return a failure in order to prevent freeing
> memory.
> This may have been removed after the introduction of talloc_reference()
> though.
> 
> In general having a destructor fail makes little sense indeed, as there
> is no way to recover the situation.
> 

Yeah, that's my thinking exactly. We either need to build a mechanism
for handling the failures or we need to disallow failures altogether.
I'm in favor of the latter.

If we wanted to handle the failures somehow, one possible approach might
be to build a callback mechanism for talloc when a destructor failed. It
would be passed the context whose destructor failed (which could
theoretically be queried for its type). It would take a LOT of coding to
get the handling right, though. I'm not sure there's sufficient value to
justify it.

> > > 
> > > > 2) It strikes me as strange that we're calling the destructor before
> > > > we're we're handling the destructors of the children. It's contrary to
> > > > (at least my) expectations about the order that destructors should
> > > > execute. Freeing of memory and execution of destructors should be
> > > > happening from the "bottom-up" of the hierarchy.
> > > 
> > > If we did it in that order, then a top level destructor could not rely
> > > on the contents of the structures that it may be accessing.  This is
> > > because typically the talloc tree follows the pointer tree, and many of
> > > the elements in a structure with a destructor are allocated (and some of
> > > those allocations might be with malloc/etc and a destructor).
> > > 
> > 
> > This is an interesting point. I wonder whether we're encouraging a bad
> > practice here, though. In order to minimize the likelihood of failure, a
> > destructor really should be truly minimalistic. It should not be
> > performing complex operation on child contexts, and I'd argue that
> > proper destructor writing should be reliant ONLY on the current context
> > being valid and any of its necessary components being outside the
> > current memory context being freed.
> 
> It would be nice if we could be so rigid, I can think of many cases
> where the destructor relies on a structure to be accessible, and
> therefore children to be available.
> 
> However in a 2 pass strategy where all destructor are called first and
> only then all children are freed this would be guaranteed.
> Unfortunately this would also always guarantee that talloc_free() will
> be slower all the time. However I guess we could optimize it if we could
> expose a flag in the parents by walking up the parents that there is
> actually some child with destructors set. This would allow us to skip
> the double pass when no child has destructors and confine it to branches
> where destructors are set. It would shift the penalty of walking the
> branch to the time we set the destructor, and would not be too
> penalizing because we are walking up parent by parent, which is arguably
> a lot less steps than inspecting all the children.
> 

I rather like this idea. I'm trying to think of places where this could
break. A destructor can do basically anything, but it's probably
reasonable to assert that the programmer using them just be smart enough
not to rewrite the variables he's freeing in the destructors.

> > > > My vision of how this should behave is as follows: We should iterate
> > > > through the hierarchy tree, starting from the leaves. Work its way back
> > > > up, calling destructors where they exist until either we have called all
> > > > of them or any one has failed. If any failed, return -1 to the top-level
> > > > free. If all succeeded, THEN iterate the tree a second time to actually
> > > > perform the deallocation.
> > > 
> > > Given that talloc_free() is rarely checked for errors, wouldn't this
> > > leave even more memory potentially leaked, as any destructor anywhere
> > > could abort the free of an entire tree?
> > > 
> > 
> > Yes, it could. But now we are at least left with a complete memory
> > hierarchy that we can investigate, instead of a partially-deallocated
> > mess. And obviously if we changed to this approach, we'd need to be more
> > consistent about checking talloc_free() for errors. This would
> > necessitate a change in design patterns.
> 
> No, this would be too big of a shift. All developers are used to
> consider a free() an operation that either always work or nothing can be
> done if it fails. And indeed it is that.
> Even if we were to check the return of talloc_free() what would we do if
> it fails ? At most we would put out a warning and would just live with
> the leak, because calling it again would be a bit pointless, it would
> almost certainly fail again.
> 
> So the only possibility is to make it clear destructors should never
> fail and perhaps add a logging facility to talloc so that when a
> destructor do fail we log about it (perhaps with a hierarchy dump in the
> logs) and just keep going.
> 

We actually do have the logging facility built into the leak check
functions and the abort function, so we could probably use that. I
strongly support eliminating destructor failures.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20120501/fccc95fd/attachment.pgp>