[Samba] winbind : suspend nightmare
Prunk Dump
prunkdump at gmail.com
Fri Oct 25 09:42:27 UTC 2019
Hi all !
Jeremy, here my complete logs about my DHCPDISCOVER problem. It seems
there is a dhclient bug that close the connection during a small
amount of time during DHCPDISCOVER after suspend.
I understand now why samba react exactly when DHCPDISCOVER is
launched. This is because there is an NTP dhclient hook that query for
the "ntp" user id. So as winbind is used for nsswitch this make a
request to winbind exactly at the same time than DHCPDISCOVER. And so
exactly when there is a tittle lost of connectivity.
But I don't know why, sometimes winbind stay in this state. And don't
retry (or couldn't) reconnect to the server. Il need to reboot, or to
restart winbind, or sometimes "wbinfo -g" solve the problem. So it
seems there is also a problem in winbind.
So. Here my syslog. Note :
-> a cron job is launched at 11:45:01 as root
-> ntp is restarted at 11:46:15
-> DHCP discover start at 11:46:15 at end at 11:46:16
--------- syslog ---------
11:45:01 c501-02 CRON[2726]: (root) CMD (...)cation at wine/wine/ /wine)
11:46:15 c501-02 ntpd[670]: ntpd exiting on signal 15 (Terminated)
11:46:15 c501-02 systemd[1]: Stopping Network Time Service...
11:46:15 c501-02 systemd[1]: ntp.service: Succeeded.
11:46:15 c501-02 systemd[1]: Stopped Network Time Service.
11:46:15 c501-02 systemd[1]: Starting Network Time Service...
11:46:15 c501-02 ntpd[2774]: ntpd 4.2.8p12 at 1.3728-o (1): Starting
11:46:15 c501-02 ntpd[2774]: Command line: /usr/sbin/ntpd -p
/var/run/ntpd.pid -g -u 104:110
11:46:15 c501-02 systemd[1]: Started Network Time Service.
11:46:15 c501-02 ntpd[2780]: proto: precision = 0.045 usec (-24)
11:46:15 c501-02 ntpd[2780]: kernel reports TIME_ERROR: 0x2041: Clock
Unsynchronized
11:46:15 c501-02 ntpd[2780]: kernel reports TIME_ERROR: 0x2041: Clock
Unsynchronized
11:46:15 c501-02 dhclient[521]: DHCPDISCOVER on eno1 to
255.255.255.255 port 67 interval 6
11:46:16 c501-02 dhclient[521]: DHCPOFFER of 172.16.107.37 from 172.16.0.30
11:46:16 c501-02 dhclient[521]: DHCPREQUEST for 172.16.107.37 on eno1
to 255.255.255.255 port 67
11:46:16 c501-02 dhclient[521]: DHCPACK of 172.16.107.37 from 172.16.0.30
11:46:16 c501-02 systemd[1]: Reloading Samba SMB Daemon.
11:46:16 c501-02 systemd[1]: Reloaded Samba SMB Daemon.
11:46:16 c501-02 ntpd[2780]: ntpd exiting on signal 15 (Terminated)
11:46:16 c501-02 systemd[1]: Stopping Network Time Service...
11:46:16 c501-02 systemd[1]: ntp.service: Succeeded.
11:46:16 c501-02 systemd[1]: Stopped Network Time Service.
11:46:16 c501-02 systemd[1]: Starting Network Time Service...
11:46:16 c501-02 ntpd[2848]: ntpd 4.2.8p12 at 1.3728-o (1): Starting
11:46:16 c501-02 ntpd[2848]: Command line: /usr/sbin/ntpd -p
/var/run/ntpd.pid -g -c /run/ntp.conf.dhcp -u 104:110
11:46:16 c501-02 systemd[1]: Started Network Time Service.
11:46:16 c501-02 ntpd[2854]: kernel reports TIME_ERROR: 0x2041: Clock
Unsynchronized
11:46:16 c501-02 ntpd[2854]: kernel reports TIME_ERROR: 0x2041: Clock
Unsynchronized
11:46:16 c501-02 dhclient[521]: bound to 172.16.107.37 -- renewal in
12638 seconds.
-------------------------------
Now the winbind log. Note :
-> the resolution of "root" at start, caused by the cron job
-> the resolution of "ntp" caused by the ntp service restart. This is
why samba start to establish the connection. I don't know why it don't
try for "root".
-> between 11:46:15 and 11:46:16 there is many NT_STATUS_CONNECTION_REFUSED
-> at 11:46:45 (30 seconds later) the connection seems to works
--------- log.wb-SAMDOM ---------
11:45:01 string_to_sid: SID is not in a valid format
11:45:01 msrpc_name_to_sid: name=SAMDOM\ROOT
11:45:01 name_to_sid [rpc] SAMDOM\ROOT for domain SAMDOM
11:45:02 string_to_sid: SID is not in a valid format
11:45:02 msrpc_name_to_sid: name=SAMDOM\ROOT
11:45:02 name_to_sid [rpc] SAMDOM\ROOT for domain SAMDOM
11:46:15 msrpc_name_to_sid: name=SAMDOM\NTP
11:46:15 name_to_sid [rpc] SAMDOM\NTP for domain SAMDOM
11:46:15 connection_ok: Connection to fichdc01.samdom.com for domain
SAMDOM is not connected
11:46:15 Connecting to 172.16.0.30 at port 445
11:46:15 get_dc_list: preferred server list: ", *"
11:46:15 dns_send_req: Failed to resolve
_ldap._tcp.Default-First-Site-Name._sites.dc._msdcs.samdom.com
(Connexion refusée)
11:46:15 ads_dns_lookup_srv: Failed to send DNS query
(NT_STATUS_CONNECTION_REFUSED)
11:46:15 get_sorted_dc_list: no server for name samdom.com available
in site Default-First-Site-Name, fallback to all servers
11:46:15 get_dc_list: preferred server list: ", *"
11:46:15 ads_dns_lookup_srv: Failed to send DNS query
(NT_STATUS_CONNECTION_REFUSED)
11:46:15 ads_find_dc: failed to find a valid DC on our site
(Default-First-Site-Name), Trying to find another DC for realm
'samdom.com' (domain '')
11:46:15 get_dc_list: preferred server list: ", *"
11:46:15 ads_dns_lookup_srv: Failed to send DNS query
(NT_STATUS_CONNECTION_REFUSED)
11:46:15 ads_find_dc: name resolution for realm 'samdom.com' (domain
'') failed: NT_STATUS_NO_LOGON_SERVERS
11:46:15 get_dc_list: preferred server list: ", *"
11:46:15 resolve_lmhosts: Attempting lmhosts lookup for name SAMDOM<0x1c>
11:46:15 resolve_wins: WINS server resolution selected and no WINS
servers listed.
11:46:15 Could not look up dc's for domain SAMDOM
11:46:15 get_dc_list: preferred server list: ", *"
11:46:15 ads_dns_lookup_srv: Failed to send DNS query
(NT_STATUS_CONNECTION_REFUSED)
11:46:15 get_sorted_dc_list: no server for name samdom.com available
in site Default-First-Site-Name, fallback to all servers
11:46:15 get_dc_list: preferred server list: ", *"
11:46:15 ads_dns_lookup_srv: Failed to send DNS query
(NT_STATUS_CONNECTION_REFUSED)
11:46:15 get_dc_list: preferred server list: ", *"
11:46:15 ads_dns_lookup_srv: Failed to send DNS query
(NT_STATUS_CONNECTION_REFUSED)
11:46:15 get_dc_list: preferred server list: ", *"
11:46:15 resolve_lmhosts: Attempting lmhosts lookup for name SAMDOM<0x1c>
11:46:15 resolve_wins: WINS server resolution selected and no WINS
servers listed.
11:46:15 get_dc_list: preferred server list: ", *"
11:46:15 ads_dns_lookup_srv: Failed to send DNS query
(NT_STATUS_CONNECTION_REFUSED)
11:46:15 msrpc_name_to_sid: name=SAMDOM\NTP
11:46:15 name_to_sid [rpc] SAMDOM\NTP for domain SAMDOM
11:46:16 msrpc_name_to_sid: name=SAMDOM\NTP
11:46:16 name_to_sid [rpc] SAMDOM\NTP for domain SAMDOM
11:46:16 msrpc_name_to_sid: name=SAMDOM\NTP
11:46:16 name_to_sid [rpc] SAMDOM\NTP for domain SAMDOM
11:46:45 connection_ok: Connection to fichdc01.samdom.com for domain
SAMDOM is not connected
11:46:45 Connecting to 172.16.0.30 at port 445
11:46:45 ldb_wrap open of secrets.ldb
11:46:45 got OID=1.2.840.48018.1.2.2
-------------------------------
Now the strace of windbind. Note :
-> resolution of "root" don't use any connection
-> when resolving "ntp" the "connect" syscall give ENETUNREACH, so
there is really no internet connectivity
-> winbind try two more times and get ENETUNREACH
-> strangely after winbind write error messages without trying to reconnect
-> at 11:46:45 the connect success with EINPROGRESS
--------- strace winbind ---------
11:45:01 read(26,
"0\10\0\08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
2096) = 2096
11:45:01 read(26,
"\10\0\0\0\0\0\0\0\10\0\0\0SAMDOM\0\5\0\0\0\0\0\0\0\5\0\0\0"..., 44) =
44
11:45:01 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
11:45:01 geteuid() = 0
11:45:01 write(14, "[2019/10/24 11:45:01.727727, 3]"..., 86) = 86
11:45:01 geteuid() = 0
11:45:01 write(14, " string_to_sid: SID is not in "..., 47) = 47
11:45:01 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
11:45:01 geteuid() = 0
11:45:01 write(14, "[2019/10/24 11:45:01.728111, 3]"..., 93) = 93
11:45:01 geteuid() = 0
11:45:01 write(14, " msrpc_name_to_sid: name=FICHLA"..., 39) = 39
11:45:01 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
11:45:01 geteuid() = 0
11:45:01 write(14, "[2019/10/24 11:45:01.728371, 3]"..., 93) = 93
11:45:01 geteuid() = 0
11:45:01 write(14, " name_to_sid [rpc] SAMDOM\\ROOT"..., 52) = 52
11:45:01 epoll_create(64) = 15
11:45:01 fcntl(15, F_GETFD) = 0
11:45:01 fcntl(15, F_SETFD, FD_CLOEXEC) = 0
11:45:01 getpid() = 2566
11:45:01 read(6, "\310\374a\257\325'\367P", 8) = 8
11:45:01 writev(29,
[{iov_base="\5\0\0\3\20\0\0\0\250\08\0'\0\0\0L\0\0\0\0\0M\0\357\366=j\356UO\373"...,
iov_len=168}], 1) = 168
11:45:01 ioctl(29, FIONREAD, [0]) = 0
11:45:01 getsockopt(29, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
11:45:01 getpid() = 2566
11:45:01 epoll_ctl(15, EPOLL_CTL_ADD, 29, {EPOLLIN|EPOLLERR|EPOLLHUP,
{u32=3646347376, u64=94376962742384}}) = 0
11:45:01 getpid() = 2566
11:45:01 epoll_wait(15, [{EPOLLIN, {u32=3646347376,
u64=94376962742384}}], 1, 35000) = 1
11:45:01 readv(29, [{iov_base="\5", iov_len=1}], 1) = 1
11:45:01 ioctl(29, FIONREAD, [215]) = 0
11:45:01 readv(29, [{iov_base="\0\2\3\20\0\0\0\330\08\0'\0\0\0",
iov_len=15}], 1) = 15
11:45:01 ioctl(29, FIONREAD, [200]) = 0
11:45:01 getpid() = 2566
11:45:01 epoll_wait(15, [{EPOLLIN, {u32=3646347376,
u64=94376962742384}}], 1, 35000) = 1
11:45:01 readv(29,
[{iov_base="|\0\0\0\0\0\0\0xS2\356\274\10\225\6jr\22>\347\0\272\352\205\370\365~\226\213,{"...,
iov_len=200}], 1) = 200
11:45:01 close(15) = 0
11:45:01 fcntl(22, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET,
l_start=8, l_len=1}) = 0
11:45:01 fcntl(22, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET,
l_start=168, l_len=40000}) = 0
11:45:01 fcntl(22, F_SETLKW, {l_type=F_RDLCK, l_whence=SEEK_SET,
l_start=40168, l_len=0}) = 0
11:45:01 fstat(22, {st_mode=S_IFREG|0644, st_size=1118208, ...}) = 0
11:45:01 munmap(0x7fee78e15000, 1118208) = 0
11:45:01 mmap(NULL, 1118208, PROT_READ|PROT_WRITE, MAP_SHARED, 22, 0)
= 0x7fee78e15000
11:45:01 fcntl(23, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET,
l_start=40168, l_len=0}) = 0
11:45:01 fcntl(23, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET,
l_start=8, l_len=1}) = 0
11:45:01 fcntl(23, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET,
l_start=8, l_len=1}) = 0
11:45:01 fcntl(22, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET,
l_start=168, l_len=0}) = 0
11:45:01 fcntl(22, F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET,
l_start=0, l_len=1}) = 0
11:45:01 fdatasync(22) = 0
11:45:01 msync(0x7fee78e26000, 4132, MS_SYNC) = 0
11:45:01 fdatasync(22) = 0
11:45:01 msync(0x7fee78e26000, 24, MS_SYNC) = 0
11:45:01 fdatasync(22) = 0
11:45:01 msync(0x7fee78e15000, 1118208, MS_SYNC) = 0
11:45:01 utime("/var/cache/samba/gencache.tdb", NULL) = 0
11:45:01 fdatasync(22) = 0
11:45:01 msync(0x7fee78e26000, 24, MS_SYNC) = 0
11:45:01 fcntl(22, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET,
l_start=168, l_len=0}) = 0
11:45:01 fcntl(22, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET,
l_start=8, l_len=1}) = 0
11:45:01 fcntl(22, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET,
l_start=0, l_len=1}) = 0
11:45:01 fcntl(23, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET,
l_start=40168, l_len=0}) = 0
11:45:01 writev(26,
[{iov_base="\270\17\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
iov_len=4008}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0s\0\0\300",
iov_len=16}], 2) = 4024
11:45:01 getpid() = 2566
11:45:01 epoll_wait(3, [{EPOLLIN, {u32=3645998768,
u64=94376962393776}}], 1, 30000) = 1
11:45:02 read(26,
"0\10\0\08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
2096) = 2096
11:45:02 read(26,
"\10\0\0\0\0\0\0\0\10\0\0\0SAMDOM\0\5\0\0\0\0\0\0\0\5\0\0\0"..., 44) =
44
11:45:02 fcntl(22, F_SETLKW, {l_type=F_RDLCK, l_whence=SEEK_SET,
l_start=17756, l_len=1}) = 0
11:45:02 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
11:45:02 geteuid() = 0
11:45:02 write(14, "[2019/10/24 11:45:02.319391, 3]"..., 86) = 86
11:45:02 geteuid() = 0
11:45:02 write(14, " string_to_sid: SID is not in "..., 47) = 47
11:45:02 fcntl(22, F_SETLKW, {l_type=F_UNLCK, l_whence=SEEK_SET,
l_start=17756, l_len=1}) = 0
11:45:02 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
11:45:02 geteuid() = 0
11:45:02 write(14, "[2019/10/24 11:45:02.319722, 3]"..., 93) = 93
11:45:02 geteuid() = 0
11:45:02 write(14, " msrpc_name_to_sid: name=FICHLA"..., 39) = 39
11:45:02 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2335, ...}) = 0
11:45:02 geteuid() = 0
11:45:02 write(14, "[2019/10/24 11:45:02.319935, 3]"..., 93) = 93
11:45:02 geteuid() = 0
11:45:02 write(14, " name_to_sid [rpc] SAMDOM\\ROOT"..., 52) = 52
11:45:02 epoll_create(64) = 15
11:45: