winbindd main process hangs on samba-dc

Isaac Boukris iboukris at gmail.com
Tue Sep 8 19:23:24 UTC 2020


On Tue, Sep 8, 2020 at 9:13 PM Jeremy Allison <jra at samba.org> wrote:
>
> On Tue, Sep 08, 2020 at 09:05:30PM +0200, Isaac Boukris wrote:
> > On Tue, Sep 8, 2020 at 8:59 PM Jeremy Allison <jra at samba.org> wrote:
> > >
> > > On Tue, Sep 08, 2020 at 08:56:35PM +0200, Isaac Boukris via samba-technical wrote:
> > > > Hi,
> > > >
> > > > This issue was initially reported on ipa-dc, but I'm able to somewhat
> > > > reproduce in lab with samba-dc, by dropping returned tcp packet from a
> > > > DC from a trusted domain (iptables -A INPUT -p tcp -s 192.168.0.120 -j
> > > > DROP).
> > > >
> > > > As you can see in the attached log, the main winbind process goes into
> > > > blocking DC calls such as get_sorted_dc_list(), and depending on the
> > > > amount of DCs to try, it may cause clients (such as wbinfo -p, or more
> > > > importantly, smbd!) to hang for minutes and to timeout.
> > > >
> > > > Here for instance, we block for 5 second per DC:
> > > > [2020/09/08 20:27:49.595952,  3, pid=66128, effective(0, 0), real(0,
> > > > 0)] ../../source3/lib/util_sock.c:447(open_socket_out_send)
> > > >   Connecting to 192.168.0.120 at port 445
> > > > [2020/09/08 20:27:49.601764,  3, pid=66128, effective(0, 0), real(0,
> > > > 0)] ../../source3/lib/util_sock.c:447(open_socket_out_send)
> > > >   Connecting to 192.168.0.120 at port 139
> > > > [2020/09/08 20:27:54.603044, 10, pid=66128, effective(0, 0), real(0,
> > > > 0), class=winbind]
> > > > ../../source3/winbindd/winbindd_cm.c:1712(find_new_dc)
> > > >   find_new_dc: smbsock_any_connect failed for domain ACOM address
> > > > 192.168.0.120. Error was NT_STATUS_IO_TIMEOUT
> > > >
> > > > On a member machine i couldn't trigger it as it seems the
> > > > get_sorted_dc_list is done in the per-domain process (as well as the
> > > > call to fork_child_dc_connect()), while here it happens in the main
> > > > process.
> > > >
> > > > Any ideas?
> > >
> > > What version of Samba is this ?
> > >
> > > I may have already fixed this in master with
> > > the async DNS SRV record -> A/AAAA lookup
> > > changes.
> >
> > git master, in this test i only block tcp packets btw.
>
> OK, so we should be getting a good list in a reasonable time.
> Looking at the smbsock_any_connect() that should be pinging
> a new DC every second, and timing out in total after 10
> seconds.
>
> Can you add DEBUG to print out the number of DC's you
> get back from get_sorted_dc_list(), and the timings
> inside find_new_dc() ?

I don't see how we can block at all in the main process, we can have
trust with more than one domain, to me this seems utterly wrong.



More information about the samba-technical mailing list