talloc issues

Mon Jul 27 08:15:21 MDT 2009

* tridge wrote, On 27/07/09 09:01:
> Hi Metze,
> 
>  > I have some code here:
>  > http://gitweb.samba.org/?p=metze/samba/wip.git;a=shortlog;h=refs/heads/master4-talloc-metze
>  > 
>  > I'll reply more verbose tomorrow or next week.
> 
> I'll be interested in the explanation, but this seems like a pretty
> dangerous change in behaviour to me. Do you have any plans on ways to
> find existing code that depends on the old behaviour?
> 

..

> With the code in your wip branch, I get a valgrind error, as the
> printf of *s1->ptr references freed memory.
> 
> One of the things I'm trying to achieve with these talloc changes is:
> 
>   - when there is a change in behaviour that could affect code logic,
>     make sure we produce a warning in the logs
> 
>   - err on the side of leaking memory if necessary (with a warning),
>     rather than leaving code with what it thinks is valid pointers
>     that it could then dereference

I think we are talking about choosing the consequences of running what
has now been found to be "bad code" and what would become "bad code"
(with these talloc changes) in the interval before we find and fix it all.

> The 2nd point is particularly important, as otherwise we risk creating
> a security hole. The above example shows how the hole could be
> produced. We just don't have any sane way that I can think of to audit
> all our code for a change like this. 

As ambiguous talloc_free is the problem, get rid of all instances of
calls to talloc_free unless we can prove that the function callers were
expecting ownership to be passed, and can also prove that they had
ownership.

(and /maybe/ null out the callers pointer to be doubly sure).

This will involve auditing all calls to functions which call talloc_free
(recursively).

Afterwards, I doubt that any talloc_free will be left in any library
functions which aren't clearly destructor calls; except for packet_send
functions which steal.

How such code will be re-written to get rid of talloc free will depend
on the code, but at least we will know what code we need to examine.

In some cases talloc_free will be a literal shortcut for "free the most
recent reference which was taken in this function" and perhaps this can
be replaced with a proper unreference.

In other cases, we can unlink from NULL instead of calling talloc_free;
and that defines how the caller (or top of the function) must make the
initial reference.

Rather that NULL in some cases, the top-of-function could
talloc_reference to a subsystem-specific talloc_root, and then replace
talloc_free with a talloc_unlink from the same root. The caller will
then not need to pass a talloc_reference.

> I don't think 'make test' is a
> sufficient test to be sure we haven't created a security hole
> somewhere by reversing the behaviour of talloc_free() in this sort of
> example.
> 
> I also don't like this approach as it re-introduces a separation of
> the 'original' parent as being special, which I think is what got us
> in so much trouble in the first place. 

No; the original parent was ALWAYS special, we just refused to treat it
as special consistently and that is where the problem came from.

It was always special externally because talloc_zero, talloc_steal
always treated it specially, and talloc_free also but not so consistently.

Sam