nopower at suse.com
Thu Mar 14 17:15:07 UTC 2019
On 13/03/2019 10:10, Ralph Böhme wrote:
> Am 13.03.2019 um 10:35 schrieb Noel Power <nopower at suse.com>:
>> First limited testing with WSP and it worked fine, will test more today
>> and will try to look deeper into the patches too
> Cool, thanks!
> Let me know if I help with anything, I imagine the changes to mdssvc.c may be difficult to grasp if you're not familiar with the code. Phone plus tea plus tmate would be a nice option.
I haven't got as far as mdssvc (and your right maybe I might find it
difficult to grasp, to be honest I just didn't get to that point yet)
So, I've looked at the patches 0001 - 0004
Patch(s) 0001, 0003, 0004 lgtm
Which brings me to the glue code itself,
some minor typos
- * We already have a tevent fd event fort the glib GPollFD, but we
may have to
+ * We already have a tevent fd event for the glib GPollFD, but we
may have to
+ * Fetch glib event sources and attach corresponding tevent events
to out tevent
- * Fetch glib event sources and attach corresponding tevent events
to our tevent
in tevent_glib_update_events we can get timeout = -1 and if we do we end
microsec = glue->gtimeout * 1000;
which is probably not what we want to do, I suppose we need to ignore
such a timeout and *not* create a tevent timer for it
However I think there may be some problems with the general approach
here. For example the prepare, poll, dispatch only happens in the
context of glib tevent handlers.
So taking for example a server say WSP (and probably mdssv) that will
initially establish a connection with the tracker process, this along
with calling samba_tevent_glib_glue_create will set up the initial glib
glue tevent handlers that it will monitor.
WSP server will also set up non glue glib tevent to monitor a pipe for
communication from a client (using standard samba apis) When reacting to
the client requests these will happen in the context or a normal (non
glib glue) tevent handler, this handler will further call glib tracker
api(s) (for example to initiate a new query and retrieve results).
The problem I see is that if this glib tracker api results in any
changes to the set of glib fd(s) or timeouts that need to be monitored
then the glue code may not be aware of these changes (because the
prepare phase is only called from glib glue tevent handlers) See
attached async_tracker.c patch which attempts to fake these scenarios. 
The thing to note is that with tracker, calling the tracker api(s) in
the context of a 'non' glib glue tevent is working , I think though
it is working only I believe because the api call in this case doesn't
seem to cause any change in the set of fd(s) or timeouts monitored (I
don't know if you can depend on that as a certainty).
Yesterday I managed to get my WSP server to hang three times where a
call to query never got any response from glib tracker, the logs didn't
tell anything (and it is entirely possibly there is some unrelated error
in the code causing this behaviour) Unfortunately reproducing this
behaviour is not easy. The only explanation I currently could come up
with is the scenario described above.
I think to achieve reliable integration you just have to really perform
the prepare, poll, dispatch phases with every iteration of the tevent
loop, with the current patches those phases are only performed for glue
glib handlers alone. The previous trace point based glue solution had at
least the possibility to ensure those phases ran at the appropriate
time, if this is true then it is a pity because this solution is really nice
I'll try some more to try and reproduce the hang and get to the bottom of it
 I only tried a glib timer source, and that for certain hangs, I
would guess the same is true for other event sources
 I seem to remember though when testing this the last time
encountering a similar situation but in that case it caused a hang (e.g.
with the async_tracker.c modified to do the query in an immediate event
iirc) Seems though that this is not longer the case (even with the old
glue code) which leads me to believe the track and/or glib behaviour has
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4698 bytes
Desc: not available
More information about the samba-technical