[Samba] winbind : suspend nightmare

Prunk Dump prunkdump at gmail.com
Mon Oct 21 08:07:20 UTC 2019


Hello Samba Team !

First at all. It's seems that my problem is NOT (or not only) a Samba
Winbind problem. But I need to understand what's happen to send good
reports to correct maintainers.


What's I'm trying to achieve :
------------------------------------------
Gnome gdm introduced a great feature that suspend the system on
logout. This help a lot reducing the electric consumption on my school
networks where the machines are often used sporadically for just one
hour.

So I working hard, since two months, trying making winbind works with
suspend. Sadly without success...


What's the problem :
------------------------------
In a random manner, winbind lost the connection with my DC. "wbinfo
-p" works but  "winbind -i username" say that the user is unknown for
all user. Pam winbind stop working as the users are not identified.

Sometimes winbind recover after 3 à 5 minutes. Sometimes it never
recover and need to be restarted.

Strangely, sometimes a "wbinfo -g" make winbind works again...


What's cause the problem :
----------------------------------------

I don't know really. It seems that the problem appear in two situations.

1) When the system recover from suspend.

Even with a higher log level of debugging I don't see anythings
strange in the logs. As winbind use many time related service, maybe
some tickets can expire during suspend and maybe this situation is not
implemented in the winbins code.

I don't know if winbind "officially" support suspending. Currently I
have written a systemd hook that kill winbind before suspend and
restarting it after.

2) The problem appear also on "DHCPDISCOVER". I don't know why but
DHCPDISCOVER make winbind react. Just after winbind try to update the
"DC" list.

I don't understand how winbind know what dhclient do. Maybe a bug in
dhclient that close a winbind's opened socket ?

But just after after DHCPDISCOVER, winbind lost network connection
(strangely I'm ssh connected to the host so the host don't lost all
network connectivity), and dns resolution fail.

Here what I see in the logs, just after DHCPDISCOVER :

-> 12:29:27 is the time of the suspend the day before ( my hook kill winbind )
-> 07:44:43 is the time of the wake : see how everything seems fine
-> 07:46:40 is when DHCPDISCOVER is sent and when winbind lost connectivity

12:29:27  Got sig[15] terminate (is_parent=0)
07:44:43  connection_ok: Connection to fichdc01.samdom.com for domain
SAMDOM is not connected
07:44:43  Successfully contacted LDAP server 172.16.0.30
07:44:43  get_dc_list: preferred server list: "fichdc01.samdom.com, *"
07:44:43  Connecting to 172.16.0.30 at port 445
07:44:43  ldb_wrap open of secrets.ldb
07:44:43  Connecting to 172.16.0.30 at port 135
07:44:43  Connecting to 172.16.0.30 at port 49153
07:44:43  Connecting to 172.16.0.30 at port 135
07:44:43  Connecting to 172.16.0.30 at port 49153
07:44:43  Connecting to 172.16.0.30 at port 135
07:44:43  Connecting to 172.16.0.30 at port 49152
07:44:45  ads: fetch sequence_number for SAMDOM
07:44:45  get_dc_list: preferred server list: "fichdc01.samdom.com, *"
07:44:45  Successfully contacted LDAP server 172.16.0.30
07:44:45  Connected to LDAP server fichdc01.samdom.com
07:46:40  connection_ok: Connection to fichdc01.samdom.com for domain
SAMDOM is not connected
07:46:40  cldap_multi_netlogon_send: cldap_socket_init failed for
ipv4:172.16.0.30:389  error NT_STATUS_NETWORK_UNREACHABLE
07:46:40  ads_cldap_netlogon: did not get a reply
07:46:40  ads_try_connect: CLDAP request 172.16.0.30 failed.
07:46:40  get_dc_list: preferred server list: ", *"
07:46:40  ads_find_dc: failed to find a valid DC on our site
(Default-First-Site-Name), Trying to find another DC for realm
'samdom.com' (domain '')
07:46:40  get_dc_list: preferred server list: ", *"
07:46:40  dns_send_req: Failed to resolve
_ldap._tcp.dc._msdcs.samdom.com (Connection refused)
07:46:40  ads_dns_lookup_srv: Failed to send DNS query
(NT_STATUS_CONNECTION_REFUSED)
07:46:40  ads_find_dc: name resolution for realm 'samdom.com' (domain
'') failed: NT_STATUS_NO_LOGON_SERVERS
07:46:40  get_dc_list: preferred server list: ", *"
07:46:40  resolve_lmhosts: Attempting lmhosts lookup for name SAMDOM<0x1c>
07:46:40  resolve_wins: WINS server resolution selected and no WINS
servers listed.
07:46:40  Could not look up dc's for domain SAMDOM
07:46:40  get_dc_list: preferred server list: ", *"
07:46:40  ads_dns_lookup_srv: Failed to send DNS query
(NT_STATUS_CONNECTION_REFUSED)
07:46:40  get_sorted_dc_list: no server for name samdom.com available
in site Default-First-Site-Name, fallback to all servers
07:46:40  get_dc_list: preferred server list: ", *"
07:46:40  ads_dns_lookup_srv: Failed to send DNS query
(NT_STATUS_CONNECTION_REFUSED)
07:46:40  get_dc_list: preferred server list: ", *"
07:46:40  ads_dns_lookup_srv: Failed to send DNS query
(NT_STATUS_CONNECTION_REFUSED)
07:46:40  get_dc_list: preferred server list: ", *"
07:46:40  resolve_lmhosts: Attempting lmhosts lookup for name SAMDOM<0x1c>
07:46:40  resolve_wins: WINS server resolution selected and no WINS
servers listed.
07:46:40  get_dc_list: preferred server list: ", *"
07:46:40  ads_dns_lookup_srv: Failed to send DNS query
(NT_STATUS_CONNECTION_REFUSED)


So my questions :
---------------------------

1) Did winbind officially support suspending ? Or did I need to keep
my systemd hook to stop winbind on suspend ?

2) Does someone understand what's happen when winbind lost
connectivity ? Why I don't see anythings in the logs when resuming
from suspend ? Why "wbinfo -g" make sometimes winbind working again ?

3) Does someone recognize some points of my DHCPDISCOVER problem ? Any
idea that help me the file a bug to the good persons.

Thanks you very much !

Baptiste.



More information about the samba mailing list