talloc issues

Mon Jul 27 20:02:55 MDT 2009

Hi Sam,

 > I think we are talking about choosing the consequences of running what
 > has now been found to be "bad code" and what would become "bad code"
 > (with these talloc changes) in the interval before we find and fix it all.

yes, and when making an API change, you need to try to minimise the
impact on existing code. It is important to be able to find code that
relied on the old behaviour, and not introduce behaviour that would
lead to really bad results (like security holes).

The patches in metze's wip tree right now lead to either an abort() or
a dereference after use depending on the previous usage pattern. Both
are not things we want to happen to existing code.

 > As ambiguous talloc_free is the problem, get rid of all instances of
 > calls to talloc_free unless we can prove that the function callers were
 > expecting ownership to be passed, and can also prove that they had
 > ownership.
 > 
 > (and /maybe/ null out the callers pointer to be doubly sure).

I don't always know when you are proposing something seriously and
when you are not. If you are seriously proposing the above then I'd
like to know how you plan to audit the more than 9k existing calls to
talloc_free(), and how you plan on changing those calls to zero the
callers pointer when the C language doesn't allow that with the
existing syntax.

 > This will involve auditing all calls to functions which call talloc_free
 > (recursively).

This would take months, and even if we put months into it we'd miss
some. Have you tried a grep through the code to see how many calls
there are to talloc_free? Have you thought of the scale of this task?

You seem to be treating this problem as only one of what you would use
for the semantics if you were creating talloc from scratch. That is
nice as a thought experiment, but please separate this from the
problem of dealing with a large existing code base. I have seen
nothing in your proposals on how to deal in any sane way with the
existing code.

 > No; the original parent was ALWAYS special, we just refused to treat it
 > as special consistently and that is where the problem came from.
 > 
 > It was always special externally because talloc_zero, talloc_steal
 > always treated it specially, and talloc_free also but not so consistently.

It was treated asymmetrically previously. The changes I've put in now
get rid of that. As far as I know, the only external way you can tell
the difference between parents in the current code is with a
talloc_parent() call, or by calling a talloc_report*() call to print
out the hierarchy.

So right now we have a system where all parents are treated the
same. I am reluctant to go back to a system where one parent is
special.

To understand this reluctance I think you will need to look at how
talloc_reference() has actually been used in the code. Have a look for
example at libnet_RpcConnectSrv_recv(). That function takes a
temporary samr pipe and uses a reference to make it a child of a long
term context. When the temporary samr connection goes away, the long
term parent remains. Thus the temporary connection is not special in
any way except that it happened to be the first samr connection.

Cheers, Tridge