talloc issues

Sam Liddicott sam at liddicott.com
Fri Jul 24 03:05:46 MDT 2009

* tridge wrote, On 24/07/09 05:41:
> Regarding Sam's patches, I still have some of the same concerns that I
> posted last time. I think that Sam's changes make the situation
> unambiguous only in the same sense that the old talloc code was
> unambiguous. The old talloc behaviour was completely deterministic
> from a strictly algorithmic sense, in that a talloc_free() on a
> pointer with references would remove the most recent reference
> first. 

Which reference this would actually be could depend on non-deterministic
network or user-interface behaviour. This is how I discovered the problem.

Even without this non-deterministic effect, calculating and tracking the
possible runtime references at code-writing-time in order to know which
reference might be the top one is too complicated and brittle.

My proposed patch also removed that ambiguity (see below).

> Sam's patches are similarly unambiguous, in that it removes the
> original parent first.

I'm not confident from your description that you understood my patch.

talloc_free doesn't "remove the original parent first" it ONLY removes
the "original" parent. As metze indicated, a second call to talloc_free
should result in a double-free error.

[This is complicated by what should happen if the destructor prevents
destruction, and who should then be the remaining reference. Until this
question is resolved I can't offer memory-loop detection. I suspect we
either need a two-phase destruction like gnome uses or we rmeove the
ability for a destructor to prevent destruction, but lets not talk about
that in this thread]

> The ambiguity that I am trying to resolve comes not from a strictly
> deterministic/non-deterministic sense of the word, but from the fact
> that a programmer working in one module cannot with any degree of
> certainty predict what will happen without full knowledge of what is
> happening elsewhere. I think that is still true with Sam's proposed
> changes.

I believe that here you have identified the problem and summarised it

I think my proposal does solve this problem.

With my proposed changes, talloc_free only removes the parent which was
the argument to talloc_zero or talloc_steal, and nothing else, ever.

Because it only removes the original parent (or stealing parent) it is
very simple to know what is being removed without needing to predict
runtime behaviour.

The rule is that you call talloc_free only if you are acting on the
documented behalf of the allocator or stealer. Any function that calls
talloc_free or talloc_steal must in any case (by good manners) document
that it "takes ownership" of the pointer.

This means that reference management becomes a matter of intent
expressed in code rather than the unknown runtime history of an object -
if you get a valid object and your API is documented to steal, then you
can steal or free quite safely. If not, then you shouldn't have been
doing it anyway.

> I also think that the way that talloc_steal() can become a function
> that effectively changes the reference count is also a big concern,
> and not something I think we want.

The solution to this concern is to have talloc_steal fail if talloc_free
has been called.

But I think that consideration shows such a change would not be wise:

It would mean that there could exist a pointer for which it is hard to
know if it is safe to steal. It could be unsafe because the objects life
has been preserved beyond talloc_free by a reference. In such a case
talloc then has to provide the tools for a function to know if it is
safe to steal.

If we consider that tc->parent (when not null) is equivalent to a
reference, and that tc->parent as a field is really just a convenient
default argument for talloc_free, then talloc_steal(ctx, o) is really:

talloc_reference(ctx, o);
if (o->tc->parent) {
  talloc_unreference(o->tc->parent, o);
o->tc->parent = ctx; // the new default argument for talloc_free

The change in reference count is expressed in the "if" clause - which
entirely encapsulates the meaning of steal. Steal is to add a reference
and remove the default one, but we only actually remove the default one
if it exists. With multiplr references, it might not exist.

We can make it illegal to call talloc_steal after talloc_free, but then
a stealing must function implement the "if" clause directly to determine
if there is a default argument for talloc_free or not.

I don't think that "having talloc_steal preserve the reference count" is
something people actually desire, although they might "expect" it if
they don't think about the deeper implications.

Talloc_steal as I propose avoids the need for people to think about
changes to the count - they should be really thinking about predictable
preservation of the object.

But as I say, talloc_steal can avoid changing references to the count if
we permit the stealing function to make the test of tc->parent... of
course it still needs to set itself as the new default argument for
talloc_free; and so I think that talloc_steal as I propose does this so
much more neatly.

It doesn't preserve the reference count, but then the caller would never
actually be checking the reference count anyway; instead we preserve the
reference in a predictable way.

> There is another way we can approach this. It is clear that some
> people want particular behaviour from talloc, and a one size fits all
> may not be the right thing. So we could have inherited flags like
> this:
>   talloc_set_flag(ptr, TALLOC_FLAG_NO_REFERENCES);
> if you set that inherited flag, then no references would be allowed in
> this pointer of its children. That would address the concern that
> talloc_steal() can fail with the changes I have made recently, as with
> this flag set, talloc_steal() would not fail.

This flag would destroy the utility of talloc and possibly of vfs_proxy
whose pattern is to take references to various structures, in a way that
(naturally) the creator did not expect.

> Similarly, we could have a flag:
>   talloc_set_flag(ptr, TALLOC_FLAG_FREE_REF_FIRST);
> which if set would tell talloc to use the old behaviour, freeing
> references first, again guaranteeing that talloc_steal() will not
> fail.

I don't like the old behaviour.

The flags would give a choice between two behaviours that I do not like.

My standard for an acceptable talloc is partly based on my chances for
getting GNOME to adopt it for gobject. The ability to forbid references
may be useful for some certain allocations, but as the only sane
alternative to current buggy behaviour it is merely an API enforced
"just don't use references then".


More information about the samba-technical mailing list