talloc quiz, and: dangling talloc references and inconsistencies in the talloc model

Fri Jan 16 14:35:33 GMT 2009

* tridge wrote, On 16/01/09 00:19:
> Hi Sam,

thanks for taking the time on this... I'm confident it is not wasted.

> First off, I should say that talloc_reference() is definately the
> trickiest part of talloc, and that I often try to avoid it.

That suggests that you are aware that the model hasn't been entirely
effective, so I'll proceed to try and convince you, starting with a some
general values statements.

A good reference counted memory manager should make safe memory
management simple and not subject to a vague set of heuristics.

I fear that currently talloc unnecessarily increases the burden of
manual memory management by requiring users to manually estimate the
sorts of interactions that could occur between the conflicting talloc APIs.

Such manual memory management considerations are quite impractical in
applications where different modules share data structures for differing
lengths of time based on the variable order and timing of network
messages, and particularly where such messages also result in the
creation and destruction of owners of the said shared structures.

talloc works well as a tree allocator, but as a reference counting
allocator it adds the very problems that traditional reference counting
managers remove - i.e. working out who owns what, where, and when, and
how to tell when and how to safely free.

Ironically, this is the type of problem which (on a simpler scale) a
non-reference counting talloc removes - i.e. knowing when and how to
free dependant memory.

Should talloc make reference counting memory more complicated than it is
traditionally?

Some developers use their own reference counting instead of talloc, and
you also try to avoid it...

I suggest that the criteria for a safe usable talloc API will be that
reference managing talloc is not more complicated than other reference
counting memory managers.

Thank-you for your notes and reasoning. I did wonder if as you suggest,
talloc_free and talloc_steal could be normalized to present a consistent
model, but as discussed below I think it will not be  possible.

> I've also
> been quite tempted to change talloc to fail explicit talloc_free()
> calls when a pointer has a reference and to require that
> talloc_unlink() be used in that case.

This seems a reasonable safety net; and, like my proposal, it risks
breaking some codepaths that currently work "by good fortune".

By this I mean to recognize that perhaps some samba code may depend on
talloc_free to remove references, and perhaps in preference to the
original owner (which is unsafe as it leads to dangling references if a
reference is made without the owners expectation).

> An explicit talloc_free() just
> doesn't give enough information to the talloc library to always do the
> 'right thing'.

If we want to have the talloc model of "all parents/references are the
equal" then I think that talloc_free is an inappropriate throwback to
the pre-reference talloc, which as you say doesn't provide enough
information.

..and yet for simple use cases talloc_free does seem to be the right
thing, avoiding unnecessary complexity

I tried to consider if there was any way (without my full blown
solution) talloc_free could be retained for such simple cases without
also having the risks of hard-to-trace errors and the burden of manual
memory management that this would imply.

Perhaps if allocations that were intending to be freed with talloc_free
were marked in some way when (or right after) they were allocated so that
1. talloc_reference would fail entirely and avoid the danger
2. talloc_free would fail where there was no such mark (thus enforcing
the marking where required).

such usage would be cumbersome, but it would at least be definitively
safe - whereas safety is currently achieved by "try not to..."

> That said, I don't agree with the proposed solution (though perhaps
> you can convince me).

I've spoken at the top of the burden of manual memory management that
reference managing talloc brings, and of how developers try to avoid
this burden by not using the reference managing aspects of talloc.

Below I will explain the different types of problems that arise from the
current talloc model which require the sort of manual management that
makes developers talloc_reference management, and for some of which you
already been seeking a solution.

> I deliberately chose the current semantics for a
> good reason.
>
> [deleted]
>
> What talloc is really trying to simulate with references is the
> ability for a pointer to have two parents. The only difference between
> the two parents of p2 is that one was established earlier than the
> other. The fact that internally talloc considers one to be an 'owner'
> and the other a 'reference' is supposed to be hidden as far as
> possible from the programmer. Unfortunately it isn't completely
> hidden.

I think the reason is good, but that the implementation hasn't brought
these advantages. It's not just a case of not being completely hidden,
it's completely recognized via talloc_steal and talloc_free which
operate on the owner.

It is not only that the owner was created earlier that matters (this can
sometimes be managed as the developers can often easily retain a concept
of "original owner" in the code), but the fact that the ordering of
subsequent references can be run-time dependant and it is therefore
difficult to predict the second owner.

It's this that causes developers to have to guess whether or not it will
 be safe to use talloc_reference or instead their own reference counting.

If we want to hide the idea that owners and references are treated
differently, then we need to remove or change the behaviour of
talloc_steal and talloc_free so that they aren't treated differently.
(metzes proposed patch should solve the talloc_free dangling reference
problem but not the talloc_steal dangling reference problem for which I
attach a test case)

Talloc steal and talloc_free have dodgy interaction, because the
behaviour of talloc_free (whatever that may end up being) when called
before talloc_steal, affects which reference would be stolen from,
making it both necessary and difficult to manually manage memory.

The difference between a reference and an owner are:
1. the owner was used at allocation, or at talloc_steal
2. when the owner expects to be stolen from, it expects to never call
talloc_free
3. a reference is made using a different api, talloc_reference
4. a reference does not simply expect to be stolen from

I realise as I write this that (2) is another manual memory management
rule. It also requires knowing for a certainty whether or not the
allocation will be subject to talloc_steal, and to know this by the time
talloc_free might have been called.

Surprisingly, as a consequence of talloc trying (but failing) to hide
the difference between owners and references, the programmer must be
doubly aware of it in order to free the correct way. When a reference
becomes owner (because the owner was indirectly freed) and is then
subject to a talloc_steal, not only does the reference no longer need to
be un-referenced [and should NOT if it had more than once reference] but
if the stealer is freed, then so is what was referenced and a dangling
reference results - with the associated dangers of memory trampling and
even security problems like privilege escalation. The following
uncompilable code shows (compilable test suite case attached)

void *owner, *p1, *ref, *r1;

owner=talloc_named_const(NULL, 0, "owner");
consumer=talloc_named_const(NULL, 0, "consumer");
r1=talloc_named_const(NULL, 0, "r1");

p1=talloc_zero(owner, ...);
ref=talloc_reference(r1, p2);

talloc_free(owner);
/* r1 is now the owner */
/* A */
talloc_steal(consumer, p1);
talloc_free(consumer);

Note that p1 is now freed, despite the reference intended to keep it
valid. In a simple example the answer is "you should have been able to
tell" - but that requires the programmer to be fully aware that a parent
is different to a reference, and to track the point at which a reference
might become a parent.

Not only does talloc fail to hide the difference, it requires the
programmer to explicitly check.

In real life, these example talloc calls might spread across different
modules (but occur in the same order), and so at point /* A */ the code
would need to "know" whether or not the owner had been free, or check
whether or not r1 had become the new owner.

Even the suggest constraints on explicit talloc_free don't help here,
because p1 was implicitly freed.

The problem is that talloc_steal as an API exposes the significance of
ownership against reference.

So it appears that we have 2 choices; either
1. remove talloc_free and talloc_steal,possibly replacing with
talloc_unlink and talloc_reference
2. talloc freely admits that owners and references are recognized as
different.

I propose (2) which is less intrusive than (1). I discuss this further down.

The 3 cases you give below are considered from the perspective of the
programmers intent and indicate when the programmer should expect to
have to have manual memory management problems.

The idea behind general reference counting is so that programmers don't
have to consider so much which usage scenarios their code fits (or could
fit in conjunction with interactions from other modules), and so this
represents a failing in the talloc model in that it doesn't protect
programmers from having to take such considerations, and from the
inevitable mistakes that follow.

I prefer to consider the three scenarios from talloc's point of view.
talloc cannot even tell which scenario applies in order to tell whether
or not it has enough information - and this is the same problem the user
often has. If the rules are too hard to build into talloc, we should not
ask the user to "recognize" them and either pick some hueristics to
mostly avoid the problem or do their own reference counting memory
management.

Otherwise tallocs reference counting only serves the cases that are
simple enough to not really need it.

> So from that point of view the memory tree looks like this:
>
>                            root
>                             / \
>                            /   \
>                           /     \
>                          /       \
>                         r1       p1
>                           \      /
>                            \    /
>                             \  /
>                           (p2,ref)
>
> [deleted]
>   3) talloc_free(p2). This is the tricky one. There is no way to
>   distinguish this from talloc_free(ref), so we have to choose one of
>   the two above approaches. The approach I chose in talloc was that
>   the most recent parent should be removed. This is because with no
>   way to distinguish what the programmer wanted I needed some
>   consistent rule to use, and that is the most logical rule I could
>   think of and it made sense to me in terms of the common nesting used
>   with references. That gives us this:
>
>                            root
>                             / \
>                            /   \
>                           /     \
>                          /       \
>                         r1       p1
>                                  /
>                                 /
>                                /
>                           (p2,ref)
>
>
> The test_implicit_explicit_free() test checks on something quite
> different, and ignores the "two parent" view of talloc references. I
> think it is looking for consistency in the wrong way, and ignores the
> fact that talloc_free(p2) is the same call as talloc_free(ref). It
> also increases the exposure to the programmer of the idea of who is
> the 'owner' of a pointer, thus reducing the illusion talloc tries to
> create of having a true multi-parent tree structure.

Although in case 3 the rule is consistently applied in the context of
talloc, in real life it may be impossible to have consistent results
because of uncertainty of the order in which the talloc references were
originally taken. common nesting may occur in spnego session setup, but
may not occur with multiple transparent vfs having various lifespans on
common structures.

 - but certainly it is strange for a reference counting memory
allocation library to make such demands on it's users.

As you suggest, talloc_reference could be safe again by requiring users
to use talloc_unlink, and perhaps this is natural within the talloc
model. Talloc is not a mere reference counter but an dependancy tracker,
and so the dependency ought to be provided on unref.

On irc discussion it was proposed that talloc_steal should be removed
and replaced by talloc_reference.

This is reasonable, but a simple substitution wouldn't work. A called
function that would execute the talloc_steal would need to know from the
caller the mem_ctx that needed unlinking after the stealing took place.
Sometimes the caller doesn't have that, and so it would need to receive
it from it's caller, etc, which would get very messy.

Neither could the caller do the talloc_unlink before calling the
function which would execute talloc_steal (now talloc_reference) as this
may free the object before the steal could occur.

So my conclusion goes like this:

* taking reference is different to taking ownership and the differences
can't be easily hidden

* There are currently run-time dependant inconsistent error conditions
* It requires onerous manual management required to avoid error cases
* thus talloc_reference is only suitable for cases that aren't complex
enough to need it

* talloc_free is useful for simple uses
* talloc_steal is useful
* talloc_steal cannot easily be replaced by talloc_reference
* if we keep talloc_free / talloc_steal we can avoid breakage by not
promoting references to owner.

In short, in the cases where the benefits of the talloc model would be
most beneficial, it has been to complicated to obtain those benefits.

Although my proposition breaks the talloc design concept of equality
between owner and reference, it doesn't actually break talloc usage or
require talloc to be used any differently, because talloc use has
already been constrained within these bounds because ownership and
referencing were too dangerous to mix.

(it may break the insanely optimistic usage of using talloc-free to get
rid of the "top reference" [hoping we know what it is] - and that should
be stopped anyway - which you were already thinking of doing).

My proposal of having a no_owner_context makes talloc_steal safe at any
point, it also makes it easier to get rid of memory loops, by
discovering if an object is descended from no_owner_context, and whether
this is also true for all references (and their references etc).

(Consideration will also be needed for halting destructors, and
destructors that try and make use of references; but this is second one
a seperate and possibly insoluble problem)

> btw, the 'quiz' questions you posted don't make any sense to me, as
> they don't explain that the functions save_for_later() and
> sneak_final_look() do. If you give some explicit, runnable, code then
> it would make more sense to me. Just saying "you got everything right"
> doesn't tell me what the code actually does. There could well be bugs
> in talloc, but I'd need test cases to confirm that it really is a bug.

Aye, it was hard to make the quiz questions meaningful and yet small
enough to follow.

I give this test case attached, called: test_ref_free_steal

Also, the test I supplied called "test_dangling_loop" shows a bug,
described just above the test definition. My solution doesn't fix this,
but I believe the "no_owner_context" will greatly simplify the process
of spotting and freeing memory loops, as the test.

thanks for getting this far.

Sam

[PATCH 1/1] talloc testsuite: test (ref, free, steal) destroys reference

ref, free, steal destroys the reference (as it became the new owner
on free) and stops the reference from keeping the object alive.

references should only be removed by permission of the reference owner.

Current behaviour may be according to design, but it's very awkward.
---
 lib/talloc/testsuite.c |   61
++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 61 insertions(+), 0 deletions(-)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2c1dd0941abad3ed4704bb0f485ea9163017b82f.diff
Type: text/x-patch
Size: 2509 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20090116/aa410845/2c1dd0941abad3ed4704bb0f485ea9163017b82f.bin