[QUICK] talloc bugs

Sam Liddicott sam at liddicott.com
Wed Jul 1 13:52:19 GMT 2009


* tridge wrote, On 01/07/09 14:36:
> Hi Sam,
>
>  > I think they should all be allowed.
>
> you just haven't read the changes carefully enough. Looking just at
> the diff and not at the surrounding code paths is not enough.
>   
:-(
>
> I made all these changes with Andrew Bartlett, the author of this
> code, sitting next to me. Andrew kept telling me "damn, that's a
> bug", and he was the one who worked with me to do the fix. So please
> don't tell me that "it's not a bug" based on your cursory reading of
> the patch.
>   
Well, I never had AB next to me:-)
They may be bugs, but that usage should be allowed. Of course,
inappropriate use is a bug, but there is appropriate use of
talloc_new/talloc_free
> ... other examples where the same thing applies deleted ...
>
>  > http://git.samba.org/?p=samba.git;a=commitdiff;h=956b5a0003a3ab82d2d7cffb7aee6e5281b4fbb4
>  > http://git.samba.org/?p=samba.git;a=commitdiff;h=12510329217dd2b8027794b63258a34797a0f940
>  > http://git.samba.org/?p=samba.git;a=commitdiff;h=269b16212a65c9506147db381ecdcbdd58347af6
>  > These aren't bugs either, it's just a choice to have import do a steal,
>  > implying that the caller of import was/knew the allocator and is giving
>  > up ownership.
>
> again, please read the darn code more carefully! These were memory
> leaks. In lots and lots of places our python code left two parents on
> a pointer but it was only freed once.
>   
Then I agree, this code never should have used talloc_free, etc.
>  > > Sam, please look at the following bit of code:
>  > >
>  > > 	int *p1 = talloc_new(NULL);
>  > > 	int *p2 = talloc_new(NULL);
>  > > 	int *c1 = talloc(p1, int);
>  > > 	int *r1 = talloc_reference(p2, c1);
>  > > 	talloc_free(p1);
>  > >
>  > > now tell me what you would have it do. What pointers will remain at
>  > > the end of that code, and what parent/child relationship?
>  > >   
>  > p1==NULL
>  > 
>  > p2!=NULL // a valid context.
>  > 
>  > c1==r1 // a valid int* with no tc->parent and one ref
>  > talloc_chunk_from_ptr(c1)->parent==no_owner_context
>  > talloc_chunk_from_ptr(c1)->refs->ptr==p2
>
> What I was hoping you'd tell me is what you want from the callers
> point of view. The internal structure is just that - _internal_. The
> fact that it always has a 'parent' pointer is an internal convenience
> for code efficiency.
>   

I described it from the callers point of view, I just added notes as to
what the internal state is too.
If you are happy with the callers point of view as I describe it then it
shows that the API was fine all along as I think it is.

But looking at it only from the callers point of view it what has
allowed the dangling refs to continue so long.

> If I have managed to interpret your internal data structure view
> correctly then I think what you're giving above is exactly what the
> code does now from the point of view of any user of the talloc
> API. Right?
>   
We now learn that there is no unified "point of view of any user".

I was able to tell that there was an internal problem solely from the
point of view of "any user", when the dangling ref occured.

>  > That difficulty in seeing other sane behaviour is because the
>  > distinction "parent" is supposed to be immaterial, but it isn't.
>
> ok, then apart from the talloc_parent() call, exactly how would you
> find out if a parent is a 'true' parent or a reference with the
> current code?
>   

There is no "true parent"

> If it can't be seen externally then why do you care? 

Because I see it every time I get a dangling ref.

> Is this all
> really just about internal re-arrangement of the talloc code, or are
> you wanting a different API?
>   

I want an internal slight change so that we DON'T have to have a
different API. The old API was fine (even thought it may have been used
incorrectly in some of the code you were just looking at) - it was just
broken in the implementation because the secret parent pointer wasn't
hidden effectively enough and caused memory leaks when the old parents
parent became the new parent (this new parent may have global scope) and
dangling pointers.

>  > Don't leave p2 as "the parent" (requiring promotion) leave it as "a
>  > parent" i.e. the reference that it already is. Leave c1 without a "the
>  > parent" - it doesn't need one. The reference will keep it alive.
>
> again, that's an internal notational convenience, not part of the API
>   

yes but the internal part is where "the" bug is that was causing
dangling references (and leaks), which you are fixing by causing more
leaks and runtime errors instead. It's an internal bug, lets fix it
internally, instead of moving the API.

>  > It gives us all these things without breaking the ABI or introducing
>  > additional runtime errors or checks, and without additional
>  > hard-to-manage API constraints.
>
> The breaking of the ABI has absolutely nothing to do with the change
> to refuse ambiguous operations. The ABI broke because I wanted to be
> able to give sane debugging messages telling you when a caller is
> abusing the API via ambiguous calls. 

OK, here is one miscommunication between us. I thought you were
preventing the promotion by preventing free when there was more than
once reference.

If we do prevent promotion then these API calls are not ambiguous
because the meaning of tc->parent becomes well defined.

> That required making a few
> functions into macros so we can get the line of code.
>   

ok, s/ABI(, *|)//

Sam


More information about the samba-technical mailing list