about talloc_steal()

Sun Aug 13 19:01:03 GMT 2006

This is a quick response to the concerns Volker raised about
talloc_steal(). Volker was concerned that using talloc_steal() leads
to code that is too subtle, leading to complex memory hierarchies and
that those are hard to debug. I understand the concern about the
complexity of memory hierarchies, but I think that it is misdirected,
and that it is not talloc_steal() that is the real problem.

By itself, talloc_steal() doesn't lead to complex memory
hierarchies. The basis of talloc is a n-ary tree of memory
allocations, and using talloc_steal() doesn't change that. A call to
talloc_steal() is really just like a copy of a subtree followed by a
delete of the old subtree, with the resulting hierarchy being of
exactly the same complexity as if it had been created directly.

Where more subtle memory hierarchies emerge is when two additional
factors come into play. The first and most obvious is
talloc_reference(). Using talloc_reference() leads to much more
complex memory hierarchies, and each time I use it I tend to think
"maybe the basic problem is I have the wrong hierarchy to start with?
Maybe talloc_reference() can be avoided?". For a long time
talloc_reference() was quite rare in Samba4, but its use has grown,
and we now have more than 200 calls to it. This is only a tiny
proportion of all of the talloc calls (which are quickly approaching
10000 calls) but it still creates considerably complexity. I think we
could make a bit more effort to reduce the usage of
talloc_reference().

Given the context that led to this discussion, I'd note that ldb has
only 1 use of talloc_reference() in the core code, and that usage is
quite reasonable (it deals with the old nasty posix semantics
regarding close and fcntl locks). It has 5 more uses in ldb modules.

The 2nd thing that can lead to complex talloc hierarchy is if
talloc_steal() is used to "steal a child". When you talloc_steal()
with the target being a descendent of the current context then you
create a memory loop. That is almost always a really bad idea. In
earlier versions of talloc the result was infinite loops or even
crashes on talloc_free() calls. We've fixed those problems now, so a
memory loop does work in a sane fashion, but it still leaves very
complex memory hierarchies that are hard to debug.

Thankfully "steal a child" is rare. In fact, I hope there aren't any
in Samba4 right now, and it might even be worth putting in a runtime
test in the talloc code to detect it, and at minimum emit a warning
(the test could be based on a depth counter in the talloc context -
not trivial to get right, but certainly doable). It would be
interesting to see if we have any, and if we do see if they are really
needed.

>From the description Volker gave of the libcli problem he hit, I
suspect the problem stemmed from the five talloc_reference() calls in
libcli/raw/. Those 5 references lead to some real subtlety in
behaviour of the resulting memory hierarchy.

There were some good reasons for those 5 talloc_reference() calls, as
they essentially try to deal with the rather odd way that the
socket->session->tcon hierarchy in SMB behaves (especially in corner
cases like are stressed in the RAW-CONTEXT test). I would like to find
a different way to handle this though that avoids the references.

Anyway, I think it is a mistake to blame talloc_steal() and as a
consequence to avoid code that uses it. Avoiding code that uses lots
of talloc_reference() calls is much more reasonable, and hopefully we
can reduce the usage of that over time.

Cheers, Tridge