[QUICK] talloc bugs

Mon Jun 29 06:39:52 MDT 2009

As I wrote my response to Ronnie I realised that we  have a case of the
blind men and the elephant; I think there are at least 4 different
perceptions of what talloc "is" and thus different ideas as to what is
wrong with it and how it should be fixed.

I consider this elephant problem at the end of my response below to Ronnie.

* ronnie sahlberg wrote, On 29/06/09 10:59:
> The current structure of talloc is a tree structure where every object
> (except root) has ONE parent, and every object is a child to another
> object.
> A true tree structure where talloc_*() adds a leaf to the tree,
> talloc_steal() moves a subtree within the tree, and talloc_free()
> removes a subtree from the tree.
>
>
> When you start using references, I dont think that you have a tree
> structure at all anymore but just a generic graph.
>   
>
..
>
> When using references, I dont think there is a tree at all but a pure
> graph. Pretending that we still have a tree and parents et al I think
> is a mistake.
>   
Yes, you are right.

Perhaps confusion occurs as we all recognize to different degrees that
references do much more than merely solve the problem they were
introduced to solve.
> The data models and the differences between a tree and a graph are
> vast and they really need different APIs, if not for anything else
> then for makiung it explicit for people reading the code "this object
> is treated as an object in a graph" vs "this object is an object in a
> tree".
>   
I haven't yet seen that the caller needs to be aware that the allocation
is in a graph or a tree, as long as the caller can add or release its
claim to the allocation.
> In a graph I personally dont think there are any differences between
> different references. They are all equal and they are all replaceable.
> A consequence of this is that there is no longer a concept of a
> "parent", and then and there any and all relation to a tree falls.
>   
There is value in retaining the concept of parent, which is the value in
retaining the "simplistic free"; because:

    * lots of samba code uses "free" at the moment
    * the number of systems that accept a "free" function, and which
      don't track the parent up to the point of free. It makes sense to
      permit such systems to stash the parent within the tallocation
      rather than have to carry it around everywhere the pointer is
      carried around.

But I agree that if the simplistic free was abolished, then so could we
abolish steal, and the idea of parent. Parent is merely the implicit
reference that steal and free refer to, and by being implicit must
remain predictable.
> In a graph, there would be refcounting but you can never free an
> object. The only thing that makes sense in a graph is adding and
> removing edges.
>   
> Once there are no edges that connects to a certain object, that object
> can safely be freed.
>   
I had thought that this was how talloc was supposed to work.  One thing
that has come to light in this discussion is the meaning of the word
"free" in "talloc_free". I had supposed that free did not mean "free
right now", merely "release the parent reference"
> Example : A circular linked list. A circular linked list today with
> talloc is treated as one meta-object owning the metadata for the list
> och which all list elements are child objects to that linked list
> metadata object. This can be argued to be a very unnatural hierarchy
> for a list.
>   

I'm not sure if you are talking about internal talloc relationships and
structures or about generally implementing lists out of objects that
were allocated by talloc. I will presume the second. Correct me if I am
wrong.

> If the list instead was treated as a graph, this would more naturally
> map to the actual structure of a list.
> Each node in a list owns a "reference" to the next* pointer in the ring.
>   
I think this is not a strong example; there is no reason why the data
relationships of a circular list should follow the memory ownership
relationships. Also references (to me) imply some sort of interest in
what is being referenced; if the predecessor in a list kept a reference
to it's successor it would be difficult to tell when the successor was
no longer required.
> I.e. circular dependencies would not be a pathological cornercase but
> rather the natural way to structure talloc dependencies for several
> classes of objects and relations.
>   
I think that a talloc dependency does not arise by virtue of two objects
being in the same list. Of course a freed item must be deleted from the
list, but the talloc_dependancy would suggest more than that and prevent
an item from being deleted and freed.
> To continue the example, today with talloc since there are no
> relations/references between the individual objects in a list, doing
> things like removing an object from the list by talloc_free() is
> cumbersome since you dont have any relations between the onbject you
> delete and the previous object that links to you and references to
> you.
>   
I don't think this is true, dlinklist.h doesn't work like this,
DLIST_REMOVE does a neat removal/splice from the list.

[I think you are talking about linking and references within the linked
list struct, and not within the talloc graph.
If, on the other hand you were talking about the internals of
talloc_free, it does have a list of direct references to the object]
> You thus need explicit code for handling removal of a node entry in a
> talloc linked list, that essentially needs to process the entire list
> from the metadata head object all the way to the item removed.
>   
Some talloc'd objects may be in more than one type of collection, I
don't see that the internal talloc graph can help here.

As another case; I have more than one ntvfs instance using the same
client transport to talk to another server. The transport keeps a linked
list of requests from the different ntvfs that are pending a response,
but these requests are owned by the ntvfs that created them.

An ntvfs instance may be freed, which case it's pending requests are
also freed and remove themselves from the transport list as part of the
destructor (iirc).

Also, all the ntvfs have a talloc_reference to the transport, so that
the transport will not "go away" until the last ntvfs goes away.

Thus the memory ownership hierarchy may be unrelated to any collections
in which the allocated objects are inserted.
> This is due to that talloc is a tree model, not a true reference count model.
> Instead if this was managed as a refcounted graph, deleting an item in
> the linked list would NOT have to do the linear walk of the list from
> "head" to the object to find the object to be deleted, instead it
> would do something like this :
>    PreviousObject has a pointer that points to Object.
>    Since it has this pointer it also owns a reference to Object,
> PreviousObject references Object.
>    When we now delete/unlink Object, similar to how a
> talloc_destructor() works we should now have a callback to
> PreviousObject that says
>        "Object, which you have a reference to is going away, do what
> you need to do to deal with it."
>   

Why would we do this? The object should not be "going away" if there are
existing references; and removing something from a list doesn't mean
that it should be freed. It sounds like we need 2 levels of references,
one for the internal talloc graph, and one for external explicit references.

>    This callback for PreviousObject would then as arguments get both
> PreviousObject itself and also Object as argument and could then
> modify
>       PreviousObject->next to point to PreviousObject->next->next
> instead, thus changing the linking of the list.
>       This callback would then also call
> talloc_reference(PreviousObject, PreviousObject->next->next) to make
> sure that the next item we now link to
>       does not go away.
>
>
> (The example above would require a new type of talloc function   :
> talloc_destroy() that would be similar to talloc_free() but would call
> the "referenced object is being destroyed" callback before actually
> removing the object).
>   

Surely talloc_destroy() should not be called while there are other
references.
I reference the transport to make sure it DOESN'T go away, I don't want
a callback to tell me to "deal with it".

It sounds like talloc_free where the parent has many children could be
made more efficient in this manner, but it doesn't affect the API.

> I thus think references should require a separate API that deals with
> the fact they operate in a graph and not a tree.
>   

I see a graph and not a tree, (see elephant statement below) but I think
we don't need a new API, and I think introducing one will be hard. I've
kept my suggestions to retaining the current API and changing as little
as necessary to bring consistency.

> Thus needing similar allocators as the talloc_*() ones but maybe with
> a _ref postfix.
> They would use talloc_reference/unreference to add remove references
> from an object.
> Once all references to an object are removed, the object is automatically freed.
> Objects can register a "referent_destroyed" callback to be called when
> an object which is referenced is freed/destroyed.
>   

References (to my mind) are supposed to stop the referenced object from
being freed/destroed

> A new function talloc_destroy() can be used to delete an object with
> existing references. Causing the callbacks above to be called before
> the actual removal.
>
>   
This problem is certainly an elephant. I think we should each state
clearly what we see:

I see:

   1. A full reference counting graph
   2. with each node having a secret "parent" reference which is solely
      an implicit argument for free/steel and thus having important
      current value for historical reasons.
   3. When a node has no references of any kind, it's destructor should
      be called which may optionally defer the free (although what would
      then trigger the free as the parent may have gone away) I don't know.

Idea on closed-loop detection.

As for as circular references go, unlike java it can be hard to know if
a close-loop of nodes can be freed, as we don't know if there are any
active program pointers to any of these nodes.

Perhaps a node may say at the point when it loses a reference [which
includes a parent]:

    If none of the nodes which reference me have a reference from a node
    other than those in the graph that I reference, then I can be freed.

Or in other words:

    If there are nodes which reference me which are referenced even
    indirectly by nodes that I don't even indirectly reference then I
    must remain.

Now of course it may think that it could just be a nodule referenced
from a larger closed loop, but that closed loop would have been detected
when the free/unreference occurred that detached it (according to the
same rules).

Sam