[Samba] smbd linux freeze, not responding to (TERM) signals
Valentijn Sessink
v.sessink at openoffice.nl
Fri Dec 24 11:09:26 UTC 2021
Hi,
For a couple of years now, my smbd hangs a couple of times per year: smb
daemons do not respond to TERM signal, I have to use SIGKILL.
This is in a small network with mostly Apple and a few Linux clients,
server running Ubuntu Linux, used to be 18.04, now is 20.04.
The users complain "I cannot connect to the server" and the only way to
resolve is to restart smbd; however, the smbd daemons do not respond to
TERM signals, I have to KILL them. ("systemctl restart smbd.service"
will wait for 90s, then kill all smbd-s).
I'll try to give more information below, but I'm sure there is more to
add - log level or anything. Suggestions welcome.
Whenever the problem occurs, smbstatus shows several "(auth in
progress)" lines and these SMBds specifically do not listen to any signals:
Samba version 4.13.14-Ubuntu
PID Username Group Machine
Protocol Version Encryption Signing
----------------------------------------------------------------------------------------------------------------------------------------
1696515 (auth in progress) 192.168.103.42
(ipv4:192.168.103.42:56390) SMB3_11 - -
1293711 userie userie 192.168.102.119
(ipv4:192.168.102.119:51048) SMB3_11 -
partial(AES-128-CMAC)
4165094 userne userne 192.168.102.153
(ipv4:192.168.102.153:39456) SMB3_11 -
partial(AES-128-CMAC)
259670 userne userne 192.168.102.153
(ipv4:192.168.102.153:39936) SMB3_11 -
partial(AES-128-CMAC)
1700382 (auth in progress) 192.168.103.42
(ipv4:192.168.103.42:56400) SMB3_11 - -
1711963 (auth in progress) 192.168.103.42
(ipv4:192.168.103.42:53136) SMB3_11 - -
1708107 (auth in progress) 192.168.103.42
(ipv4:192.168.103.42:53134) SMB3_11 - -
1700371 (auth in progress) 192.168.103.42
(ipv4:192.168.103.42:56396) SMB3_11 - -
1657745 userlo userlo 192.168.103.18
(ipv4:192.168.103.18:53924) SMB3_11 -
partial(AES-128-CMAC)
1696496 (auth in progress) 192.168.103.42
(ipv4:192.168.103.42:56384) SMB3_11 - -
1696495 (auth in progress) 192.168.103.42
(ipv4:192.168.103.42:56386) SMB3_11 - -
1696516 (auth in progress) 192.168.103.42
(ipv4:192.168.103.42:56392) SMB3_11 - -
Service pid Machine Connected at
Encryption Signing
---------------------------------------------------------------------------------------------
IPC$ 1293711 192.168.102.119 vr dec 24 09:07:11 2021 CET -
-
shar 1293711 192.168.102.119 vr dec 24 09:07:11 2021 CET -
-
IPC$ 1657745 192.168.103.18 vr dec 24 10:41:34 2021 CET -
-
IPC$ 1293711 192.168.102.119 vr dec 24 09:07:22 2021 CET -
-
shar 1657745 192.168.103.18 vr dec 24 10:41:33 2021 CET -
-
userie 1293711 192.168.102.119 vr dec 24 09:07:11 2021 CET -
-
shar-shararaties 259670 192.168.102.153 do dec 23 10:37:13 2021 CET
- -
shar 4165094 192.168.102.153 do dec 23 09:23:24 2021 CET -
-
No locked files
In the above exampe, "kill 1696516" doesn't seem to do anything, 1696516
stays where it is. However if I "kill -KILL" all pids that have "auth in
progress" for status will make smbd behave correctly (Users: "yes, I can
connect now").
This situation used to be the same under Ubuntu 18.04 - but as that was
a rather old smbd, I hoped to fix things with an upgrade. (Yes, I am
aware of the fact that 4.13.14-Ubuntu is older, too.)
The only difference from a more straight forward setup is probably that
we run a separate LDAP server for authentication, with passdb backend =
ldapsam:ldap://127.0.0.1/
Also, since this is an existing situation that went from upgrade to
upgrade, I suspect that there will be a few outdated options in smb.conf:
[global]
log level = 1
workgroup = shar
passdb backend = ldapsam:ldap://127.0.0.1/
ldap admin dn = cn=admin,dc=kantoor,dc=shar,dc=nl
ldap ssl = off
ldap suffix = dc=kantoor,dc=shar,dc=nl
ldap user suffix = ou=Users
ldap group suffix = ou=Groups
ldap machine suffix = ou=Computers
unix extensions = yes
delete readonly = yes
ea support = yes
ldap password sync = yes
interfaces = 127.0.0.0/8 ens3
bind interfaces only = true
load printers = no
printing = bsd
printcap name = /dev/null
disable spoolss = Yes
disable netbios = yes
smb ports = 445
dns proxy = no
vfs objects = fruit streams_xattr
security = user
Shares are pretty simple:
[name]
force group = users
force directory mode = 2770
force create mode = 0660
directory mask = 2770
create mode = 0660
comment = Comment
writable = yes
path = /home/somewhere
mangled names = no
mangling char = _
valid users = @users
Oh, trying to find out what the daemon is doing:
strace -p 1700382 (but maybe I'm totally mistaken here and "strace"
isn't the right tool):
strace: Process 1700382 attached
restart_syscall(<... resuming interrupted read ...>
netstat shows:
tcp 1 0 192.168.102.3:445 192.168.103.42:56400
CLOSE_WAIT 1700382/smbd
tcp 0 0 127.0.0.1:33010 127.0.0.1:389
ESTABLISHED 1700382/smbd
unix 2 [ ] DGRAM 72953128 1700382/smbd
/var/lib/samba/private/msg.sock/1700382
What could cause these hangs?
Best regards,
Valentijn
--
http://www.openoffice.nl/ Open Office - Linux Office Solutions
Valentijn Sessink v.sessink at openoffice.nl +31(0)20-4214059
More information about the samba
mailing list