talloc issues

Fri Jul 24 06:37:12 MDT 2009

On Fri, 2009-07-24 at 14:41 +1000, tridge at samba.org wrote:
> Hi Simo, Sam and Metze,
> 
> I've been looking over the talloc discussions to try to come to some
> conclusions.
> 
> First off, I'd like to apologise to Simo for my choice of words in
> some of my messages. Some of my language was inappropriate for a
> technical discussion.

Apologies accepted.

> Regarding Sam's patches, I still have some of the same concerns that I
> posted last time. I think that Sam's changes make the situation
> unambiguous only in the same sense that the old talloc code was
> unambiguous. The old talloc behaviour was completely deterministic
> from a strictly algorithmic sense, in that a talloc_free() on a
> pointer with references would remove the most recent reference
> first. Sam's patches are similarly unambiguous, in that it removes the
> original parent first.

I think the essence of Sam's proposal, once corrected a bit from the
original form, is that references do not influence "normal"/"expected"
talloc behavior. IMHO this is extremely important.

> The ambiguity that I am trying to resolve comes not from a strictly
> deterministic/non-deterministic sense of the word, but from the fact
> that a programmer working in one module cannot with any degree of
> certainty predict what will happen without full knowledge of what is
> happening elsewhere. I think that is still true with Sam's proposed
> changes.

I think the only thing that is still a bit less deterministic is what
happen to destructors. Without references you are always sure that if
you free a talloc hierarchy all memory gets freed and all destructors
are fired. With references it may happen that some memory is kept alive
instead and destructors are delayed. But this is the same in both
solutions, so I wouldn't base my choice on one or the other, based on
this point.

> I also think that the way that talloc_steal() can become a function
> that effectively changes the reference count is also a big concern,
> and not something I think we want.

I totally agree that making talloc_steal() "special", in this sense, is
not useful at all, it is actually rather dangerous.
I think this is something that the original Sam's proposal got wrong
indeed.

Metze and I have discussed a bit what would be appropriate behavior for
talloc_free() and talloc_steal() after you take a reference. I think the
conclusion was (Metze correct me if I got something wrong) that 

A) once talloc_free() is called, any other talloc_free() or
talloc_steal() on the same talloc context should fail with a double free
error. Exactly like it would happen if there were no references.

B) the code using talloc_reference() must use talloc_reparent() and
talloc_unreference(). You cannot use talloc_free() or talloc_steal(),
they do not affect references. The code using the references is aware of
the fact you used them so it shouldn't be difficult to stick with this
rule.

Of course freeing a parent context frees also the references it may
hold, nothing changes there.

(Btw I would rename talloc_reparent() to talloc_rereference() in this
case and make it operate exclusively on references).

I think that this way you can safely use talloc the "normal" way in the
original code and live issues, if any, only on the shoulders of who
decides to use talloc_reference() without influencing code practices of
the original context owner.

This is what I call the Sam's modified/corrected proposal.

> There is another way we can approach this. It is clear that some
> people want particular behaviour from talloc, and a one size fits all
> may not be the right thing. So we could have inherited flags like
> this:
> 
>   talloc_set_flag(ptr, TALLOC_FLAG_NO_REFERENCES);
> 
> if you set that inherited flag, then no references would be allowed in
> this pointer of its children. That would address the concern that
> talloc_steal() can fail with the changes I have made recently, as with
> this flag set, talloc_steal() would not fail.
> 
> Similarly, we could have a flag:
> 
>   talloc_set_flag(ptr, TALLOC_FLAG_FREE_REF_FIRST);
> 
> which if set would tell talloc to use the old behaviour, freeing
> references first, again guaranteeing that talloc_steal() will not
> fail.
> 
> Whether adding these flags is a good idea is hard to tell. It may be
> that it just causes more confusion, as the API would be different in
> different parts of the code. Please let me know what you think.

I am not sure it is a good idea, I think the code would become even more
unpredictable, as you would have to know before hand whether you can use
parts of talloc, and how talloc behaves in that case. The original
behavior was broken so I don't think we should make it possible to
revive it in any case.

But back to the Sam's modified proposal, vs the current committed code.

I think the main problem with the current solution is that you
practically make using talloc_reference() dangerous.

It can be used only if you totally control the whole hierarchy. Using
talloc_reference() now can make talloc_steal() to fail, so you cannot
use talloc_reference() unless you are 100% certain you are not going to
make the original code literally blown up (what do you do if
talloc_steal() fails ?). But, if you have total knowledge of the
original hierarchy (which may or may not be easy for anybody), then
almost certainly you can arrange the code to not use talloc_reference()
anyway.

The biggest issues come when you are a module writer (eg a vfs module
writer). With Sam's (modified) proposal you can safely use
talloc_reference() on a piece of memory being certain you are not
causing trouble to whatever code this talloc pointer comes from. As the
original code doesn't need to be changed in any way, and from the
original code point of view, both talloc_free() and talloc_steal() are
final and always behave as expected.
With talloc_reparent() instead, a vfs module writer simply cannot risk
using talloc_reference(). A module writer does not have access or
knowledge of the internal hierarchy of the caller and cannot risk making
a talloc_steal() fail by taking a reference, nor can he change the code
he is taking a reference from.

This is also a problem for any other project that is highly modular and
uses talloc. If the current talloc_reference() stands, the only option
will be to ban usage of talloc_reference(), because you can't go and
change a talloc_steal() everywhere just because you want to use a
talloc_reference() somewhere else. Besides, sometimes changing a
talloc_steal() in an explicit talloc_reparent() is way too difficult, it
may require code changes that are quite invasive, and generally
unnecessary.

With Sam's (modified) proposal instead the original code always works as
usual, talloc(y, x)/talloc_steal(z, x)/talloc_free(x), no changes there,
no surprises. If you start using talloc_reference() the only effect is
that you keep a piece of memory alive for some more time, and you are
responsible for handling anything that has to do with the references.

> Next issue - the .so version. The best solution I have heard on this
> is from Metze, where he proposed to solve it in the packaging, by
> creating a .so.1 package which links to the .so.2 package, providing
> just the compatibility functions. That keeps the compatibility cruft
> out of the main talloc code, and also puts the choice of how long to
> keep these functions around squarely in the shared library packaging
> area, so Simo can choose to keep them as long as he thinks they are
> needed for Fedora. The main talloc code would switch to .so.2, as the
> API and ABI have changed.

I still prefer not changing the .so if possible, as a general rule.
But if we rename the functions that have changed signature, create this
compatibility library and have a default option (for some time) to build
and install it along side with .2, and all agree this is good enough, I
guess we can try it.

Simo.

-- 
Simo Sorce
Samba Team GPL Compliance Officer <simo at samba.org>
Principal Software Engineer at Red Hat, Inc. <simo at redhat.com>