[QUICK] talloc bugs

Sam Liddicott sam at liddicott.com
Wed Jul 1 13:17:56 GMT 2009

* tridge wrote, On 01/07/09 13:06:
> Hi Sam,
>  > I think this debugging is really great and will make talloc debugging
>  > much simpler.
> yep, it made finding the ambiguous uses of talloc_free and
> talloc_steal very easy, though I need to fix up the stderr hack.
>  > However I don't think that Samba code was "broken", it made fair and
>  > reasonable use of a fair talloc API, as other libtalloc using programs
>  > will do.
> I think if you look at the actual changes I made, you'll find they
> were nearly all real bugs. Is there a particular one you think was
> fair and reasonable that should still be allowed?
I think they should all be allowed.

It should be OK to steal something with references. Steal implicitly
works on what I now call a convenience pointer called "parent" that is
kept in the talloc chunk. There is nothing ambiguous about that.

I can't say if that fixes a bug, whether or not steal or reference is
used depends on whether or not the caller expected to be giving up
ownership or was going to call free after the dsdb_make_schema_global()

I think unlink is better but free isn't a bug if it is clear from the
code context what value tc->parent has. It fits the simple malloc/free
case, except if someone else had referenced then it wouldn't unallocated
(as expected).

These aren't bugs either, it's just a choice to have import do a steal,
implying that the caller of import was/knew the allocator and is giving
up ownership.

This one is a bug now. If you knew who the tc->parent was then you
should be able to just call talloc_free.
It's a fine example of where it is convenient to use tc->parent, but
recognizing that tc->parent is only of hierarchical value because it is
treated as if it were part of tc->refs.

> The new
> behaviour is to leak memory if that race happens. I think a runtime
> leak on error (with a nice big warning telling you where the bug is!)
> is better than the previous ambiguous behaviour.

It is much better than the previous behaviour, but we can even avoid
that, and also avoid the ABI change and API restrictions, just by a
change of mindset.

>  > I guess parent also means reference there...
>  > > If you pass in a reference as the old
>  > > reference then the reference is moved to the new parent.
>  > >   
>  > but that bit doesn't make sense, I guess you meant that if you pass in a
>  > reference as the old reference then the new reference also becomes
>  > parent (and the old parent becomes a reference?)
I think I was talking rubbish then, but I'm still glad of your
explanation below.
> no, I meant what I said. If old_parent in talloc_reparent() was a
> reference, then it will still be a reference after the
> talloc_reparent() call.
I looked in the code and in the above statement "it will still be a
reference" I see you mean newparent will be a reference just as
oldparent was. This makes sense to me, I hadn't realised that you were
saying that.

> With a bit more complex code we could make it become the "real" parent
> internally, 
ugh, no. And there is no such thing as real parent, that is what the
problem has been - the failure to eradicate that notion.
> but the point of these changes is to avoid the ambiguity
> that stems from treating references and parents as different. It just
> doesn't matter anymore, except for the pretty-printing when you
> display the memory tree, and for the one function talloc_parent().
hurrah! I didn't believe you would do such a change which is why I
mis-interpreted what you said in the earlier paragraph.

> The aim is to make a reference and a parent the same thing as far as
> the public API is concerned. Which a pointer is internally should not
> matter.

Yes. (modulo the re-framing which re-purposes tc->parent for convenience

>  > I refer you to the re-framed talloc, below; there is no need to failthe problem is that it has never been hidden from the users, but that
> it has been hidden internally by treating them the same through
> promotion.
>  > with an error.
> You mentioned in a previous mail that you have been running with this
> new model for a while - can you post the patch you are using? 
attached. 0002 is perhaps already committed and is about as much as I
could get pushed last time. It may be needed for 0003 and 0004 to attach.
> I know
> you posted a patch a few months back during the previous discussion,
> but if I remember correctly that patch was missing some basic
> behaviour (I can't remember exactly what). Have you resolved the
> problems?

The version attached that uses no_owner_context (instead of a flag) has
had no problems as far as I know.

In your message:
you felt that the problems were the ambiguities in not knowing "what the
programmer wanted".

I think treating tc->parent as a convenience pointer makes it clearer
for talloc and the programmer to know and express what was wanted.
>  > The simplest way to fix it is not to have to fix it (samba) at all. We
>  > only have to fix it because of this darned who-knows-why insistence on
>  > promoting reference to parent while simultaneously admitting through the
>  > API that references and parents are not equal and therefore actually
>  > admitting that such promotion is unwise but doing it anyway.
> Sam, please look at the following bit of code:
> 	int *p1 = talloc_new(NULL);
> 	int *p2 = talloc_new(NULL);
> 	int *c1 = talloc(p1, int);
> 	int *r1 = talloc_reference(p2, c1);
> 	talloc_free(p1);
> now tell me what you would have it do. What pointers will remain at
> the end of that code, and what parent/child relationship?

p2!=NULL // a valid context.

c1==r1 // a valid int* with no tc->parent and one ref

> The current code leaves p2 as the parent of c1. That necessarily
> involves 'promoting' the reference to be a parent. I just cannot see
> any other sane behaviour. 

That difficulty in seeing other sane behaviour is because the
distinction "parent" is supposed to be immaterial, but it isn't.

Don't leave p2 as "the parent" (requiring promotion) leave it as "a
parent" i.e. the reference that it already is. Leave c1 without a "the
parent" - it doesn't need one. The reference will keep it alive.

> If you can explain why the above example
> should behave in any other way than it currently does then I'll start
> listening to your proposed changes. Without the above example working
> differently, it's all a pointless discussion.

It should behave the way I suggested in answer to your question and not
the way it currently does because it avoids surprises with relation to
free/steal and still lets us keep free/steal for backwards compatibility
and as a simple API for older programs that were written for malloc.

It gives us all these things without breaking the ABI or introducing
additional runtime errors or checks, and without additional
hard-to-manage API constraints.

It gives a simpler talloc model to explain and understand.

It's a pure ref-counting talloc with no preferred parent, but with a
convenience pointer for compatibility reasons.

> I _want_ the difference between references and parents to not be
> visible to users of the talloc API. It is in my view the only natural
> way of having references in talloc. 

I agree, (and better than the current calls to drop talloc_reference).

> The ambiguity we had previously
> was the flaw in that, and now it's fixed.

That fix breaks the ABI, constrains the API and introduces runtime errors.

My fix avoids the ambiguity internally instead of externally, preserves
the ABI, rationalises the API and doesn't introduce extra runtime errors.

> If what you want is an _internal_ change to talloc, so that it
> internally becomes a multi-parent data structure and not references
> then we could consider that, and if it made the common use cases of
> talloc faster then we could do it. I think it will make it slower as
> more than one parent is extremely rare as a percentage of all the
> talloc pointers we have.
ugh no, I was just respecting your view that talloc already has
multi-parent data structures in tc->refs, and building on that.
I had suggesting explicitly adding the current tc->parent onto tc->refs
for consistency reasons but have also agued that it isn't necessary.

> As of now, the fact that internally talloc uses two different ways of
> representing parents is hidden from users. 

Yes, but it's better to hide that internal change by not using two
different ways, rather than by breaking the API/ABI/etc.

Or for speed of development purposes we don't even need to change much
internally, a quick change of mindset to pretend that tc->parent is an
honorary member of tc->refs (for ref enumeration purposes) and that
tc->parent is a non-hierachical convenience pointer for the
new/free/steal simple API will hide the internal change just as well.

> Apart from the
> talloc_parent() call, I don't know of any way to find out which is the
> 'real' parent and which is a reference. So the internal representation
> is hidden from applications. That means we can change that
> representation in the future without breaking apps, if we have reason
> to do it.
The internal representation was already hidden enough. It was the
internal manipulation that was exposing it.

Real parent has been so problematic, that I suggest that there isn't a
"real parent" any more. Just a convenience pointer, implicit for steal/free.

I think we've had a fundamental mis-communication all along, but I still
don't know what it is.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Make-explicit-free-behave-the-same-as-implicit-free.patch
Type: text/x-diff
Size: 3270 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20090701/b7f1b273/0001-Make-explicit-free-behave-the-same-as-implicit-free.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-TODO-talloc-talloc_free-should-remove-the-refere.patch
Type: text/x-diff
Size: 1072 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20090701/b7f1b273/0002-TODO-talloc-talloc_free-should-remove-the-refere.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-prevent-git-ref-unref-mismatches.-prevent-reference.patch
Type: text/x-diff
Size: 28381 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20090701/b7f1b273/0003-prevent-git-ref-unref-mismatches.-prevent-reference.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-talloc-testsuite-test-ref-free-steal-destroys-r.patch
Type: text/x-diff
Size: 3210 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba-technical/attachments/20090701/b7f1b273/0004-talloc-testsuite-test-ref-free-steal-destroys-r.bin

More information about the samba-technical mailing list