[QUICK] talloc bugs

Sam Liddicott sam at liddicott.com
Wed Jul 1 08:09:51 GMT 2009

[Both messages replied in one, apologies to the thread-overview-builder]
I suggest tired reader read the response to the bottom message first.

* tridge wrote, On 01/07/09 06:27:
> Hi Sam,
>  > Maybe I made what should be an obvious mistake, but I can't find it.
>  > I'm still looking to see what is going on.
> I've spent some more time on this today, and I've commited some
> changes to talloc to fix the problems.
> The main change is here:
>   http://git.samba.org/?p=samba.git;a=commitdiff;h=5fe1d8dc1275e43d96da1297f5fb0d0088a1c3ab
> As the commit message explains, this is an expanded version of your
> simple abort() patch. It changes talloc_free() and talloc_steal() to
> print error messages to stderr like this:
>     ERROR: talloc_free with references at some/foo.c:123
> 	   reference at other/bar.c:201
> 	   reference at other/foobar.c:641
> so it is simple to find all the places in the code that are broken. A
> full run of make test then showed there were just a few places that
> needed fixing (I think about 10). I've fixed those places in a series
> of commits preceding the main talloc change.
I think this debugging is really great and will make talloc debugging
much simpler.

However I don't think that Samba code was "broken", it made fair and
reasonable use of a fair talloc API, as other libtalloc using programs
will do.

I don't think it is wrong to want to avoid use of talloc_free in a
program, but making it a run-time error to call talloc_free if a
yet-to-be-written loadable module might decide by race-condition to keep
a reference a bit longer, and thus fail talloc_free, is a way of making
talloc brittle and fail in hard to foresee or and hard to repeat

> The commit also adds a new function talloc_reparent() which is like
> talloc_steal(), but it takes both the old parent and the new parent,
> thus removing the ambiguity from talloc_steal() when used on a pointer
> with more than one parent. 
I guess parent also means reference there...
> If you pass in a reference as the old
> reference then the reference is moved to the new parent.
but that bit doesn't make sense, I guess you meant that if you pass in a
reference as the old reference then the new reference also becomes
parent (and the old parent becomes a reference?)
> This now means that the whole talloc interface now hides the
> difference between a parent and a reference, so we can now really
> think of it supporting multi-parent pointers. The only places where an
> ambiguity can arise (talloc_free and talloc_steal when there are
> multiple parents) will now fail with a error.
I refer you to the re-framed talloc, below; there is no need to fail
with an error. There is only ambiguity if we promote reference to parent
internally. If we cease this practice then the ambiguity vanishes and we
then don't need to add these run-time constraints to talloc, but I treat
this fully in response to the second message, below.
> The expection is the function talloc_parent() which returns the true
> "parent". I used that in a couple of places in fixing the breakages
> caused by this patch, with the worst fixes being like this:
>   p = talloc_reparent(talloc_parent(ptr), new_ctx, ptr);
> If you search for the places where I used talloc_parent() in this way,
> you'll find that the old code is basically pretty badly convoluted in
> how it handles references. Putting in the above type of code was the
> simplest way to fix it, but is certainly not the cleanest approach.
Tridge; it is not the simplest way to fix it and as you say not a clean

The simplest way to fix it is not to have to fix it (samba) at all. We
only have to fix it because of this darned who-knows-why insistence on
promoting reference to parent while simultaneously admitting through the
API that references and parents are not equal and therefore actually
admitting that such promotion is unwise but doing it anyway.

But I make this contradiction (and whole issue) go away by magic in a
new re-framing treatment below...
> The commit also notes that using stderr like this in a library is very
> dogdy, as fd 2 could be a file or a socket in badly behaved
> daemons. I'll add a register function that allows a daemon to register
> a debug function.
I think the debugging code is great, and necessary if references and
parents are equal.

* tridge wrote, On 01/07/09 06:30:
> Hi Sam,
> I should also say that we can still revisit
> talloc_reference/talloc_free and look at other models. I think the
> commits I just made solve the problem, and they certainly remove the
> ambiguity so the main traps with using
> talloc_reference/talloc_free/talloc_steal are now gone.
> We could consider a more intrusive API change if someone can propose
> one that is a significant improvement over the minimal change I just
> made.
I prefer a less intrusive API change, one that doesn't change the API at
all but just re-frames it. I don't seem to tire of mentioning it. I
think the addition of talloc_reparent and the debugging of reference
file/line-numbers is very necessary for full equality.

Let us re-frame talloc thus:

Talloc is is a multi-parent referencing model where all parents are
indeed precisely equal, and are called references or parents, meaning
the same thing. There is no special treatment for any parent. There
isn't even a parent field in the talloc_chunk.

There is no talloc_steal and talloc_free; only talloc_reparent.

That is the ideal model in which we frame talloc. Now consider this:

It is safe, and rarely inconvenient for an author to have the original
reference to hand when he wants to explicitly free an allocation BECAUSE
he rarely explicitly frees an allocation (huzzah for talloc!)

Many of the inconvenient cases will be when older software (malloc/free)
is ported to talloc, or software that was written for the
pre-referencing talloc.

It will usually not be hard to find an owning reference when allocating
(and NULL will do in a pinch), but it will be hard to convey this
pointer to the point at which the allocation is freed. It will be
convenient in such cases to permit the allocating reference to be
stashed somewhere inside the talloc_chunk (perhaps in a NEW field called
"parent"), so that it gets passed around automatically with the pointer;
- and this gives us back talloc_free - and really only for such simple
software that wasn't designed for talloc from the ground up and can't
have the allocating reference at hand when it wants to free.

Of course the allocating "parent" reference MUST still be stored in
tc->refs (as all references are truly equal). This special parent field
in the talloc chunk is merely a convenience for older software that is
used to calling free(ptr). It merely references ONE implicit reference
so that we can have a malloc-compatible free function.

It is clear that if we do store this one allocating reference in a
"parent" field, then we don't actually NEED to also store it in the
tc->refs list as well. We don't NEED to. Of course if we were writing
from scratch, we would. But we don't NEED to - and suddenly this is the
code-base we have right now if only we can stop promoting a reference to
the parent field in talloc_free.

And lets be honest, if there is really no difference between reference
and parent, why do we NEED to do such a promotion? If we don't do such a
promotion then we have the memory model described above which nicely
encapsulates the equality of references model as well as backwards
compatibility for malloc (free/steal) and doesn't change the API one
tiny bit.

It just presents a new way of looking at it which is magically
compatible with existing uses and doesn't require any hacky or unclean

If you are willing to accept a more intrusive change, are you willing to
accept a less intrusive change, one that doesn't require changes to
existing talloc programs and doesn't put additional constraints on the
use of the API, but merely re-frames talloc from where we might have
started if we were starting again?


More information about the samba-technical mailing list