the sorry saga of the talloc soname 'fix'

Thu Jul 9 12:55:34 GMT 2009

On Thu, 2009-07-09 at 12:31 +0100, Sam Liddicott wrote:
> 
> I think there was some confusion over repeated calls to free. I hope I
> cleared these up.

Sam, I am sorry to say that you aren't always very clear :-)
But that doesn't mean you may not be right.

On the current behavior after Tridge's patch:
---------------------------------------------

The side effect of the patch is that now talloc_steal() and
talloc_free() may return errors a lot more easily, and not for
catastrofic reasons (not for ENOMEM or for corrupted hierarchies/failing
destructors).

Returning errors in talloc_free()/talloc_steal(), would make
talloc_reference() almost impossible to use, a pariah, because you are
going to cause problems to *other* people code if you snatch a
reference.

talloc_steal() would be an especially bad case. If you don't check the
return, as it could lead to access to freed memory. Unchecked errors
from talloc_free() would lead to memory leaks.

But worst of all there is no recovery, if I get an error from
talloc_steal() or talloc_free() what should I do ? abort() ?
That would be quite insane IMHO.

talloc_reference() exist to keep an object alive when we *don't* know or
control its hierarchy/lifespan, if we *do* know/control the
hierarchy/lifetime then most probably you don't need talloc_reference().
So, basically, returning errors from talloc_free()/talloc_steal() kills
most of talloc_reference() utility.

On Sam's proposal:
------------------

What I like about this proposal (if I understand it) is that
talloc_free() and talloc_steal() do not return errors just because there
is a reference on an object, which is good IMO.

Let me try to sum up what a talloc with your changes would look like and
see if I got what you mean:

1. plain talloc()
Basically you say that any new allocation is also an implicit reference,
so when I allocate c1 on p1 then c1 has p1 as parent (which is just an
implicit *default* reference).

2. talloc_reference()
Here you say that talloc_reference() is just adding a reference to an
existing memory context. if you have talloc_reference(p2, c1) this means
c1 has just a new reference. the only way to remove this reference is by
calling talloc_unlink(p2, c1)
2.1 Q. does talloc_free(p2) also automatically free c1 if p2 was the
last reference for c1 but not the parent ? Or do you need to set an
explicit destructor on p2 to call talloc_unlink(p2,c1) ?

3. talloc_steal()
For normal cases talloc_steal(p3, c1) just changes the parent and at all
effects c1 changed ownership.
For cases where it would happen that c1 was alive only because of
additional references then talloc_steal ADDs a new parent.
Therefore you still need to talloc_free(c1) (or more frequently just p3)
and to talloc_unlink(p2, c1)
It's unlikely you are going to mix talloc_steal() and talloc_reference()
in the same functions anyway, so the case where a talloc_steal() is
called on a freed object is going to be rare.

4. talloc_free()
A free only removes the parent, it *never* promotes a reference to be
the new parent, if there are no more references the object is actually
really freed. If there are other references the object will not be freed
until each referencing object is actually freed or an explicit
talloc_unlink(ref, c1) is called.
If 2.1 holds true then I see no major problem here, if "all parents" are
freed the object is freed as well, and as long as one referencing parent
exists the object exists
4.1 What will happen if I call talloc_free() on an object that has no
parent ? Do we get an error ?

Problem:
How do you identify the parent as freed? NULL can't be used as it is a
valid parent. We may need to introduce a boolean:
tc->parent_allocated = true/false

Conclusions:
------------

Your patch would change fundamentally just one thing:
if you used to use talloc_free() to directly kill a reference then you
may get errors/memleaks depending on the implementation.
This is not going to be a huge problem in most cases and is going to be
contained in the whereabouts of the original talloc_reference() so that
it should be easy to *know* who is the parent to actually perform a
talloc_unlink()
(It may be nice to have a talloc_unreference() call perhaps to lessen
confusion for the API users).

It still means some code need to be changed, but that is equally true
with current Tridge's patch, the difference is that changes should be
localized around uses of talloc_reference() in most sane cases.

The other *good* difference is that only the code using
talloc_reference() will need to worry about the effects of
talloc_reference(), code not using it can keep being the usual simple
combination of talloc(), talloc_steal(), talloc_free() and no ill
effects will be felt if some foreign code takes a reference on some
memory object. The only effect will be (as expected) that the referenced
memory will be kept alive for some longer time.

Do I get this right ?

Simo.

-- 
Simo Sorce
Samba Team GPL Compliance Officer <simo at samba.org>
Principal Software Engineer at Red Hat, Inc. <simo at redhat.com>