[Samba] How to debug a hard freeze?

Valentijn Sessink v.sessink at openoffice.nl
Wed Dec 29 12:50:14 UTC 2021


Hello list,

I'm having a hard freeze of samba once in a while, and I don't know how 
to debug it best. I hope this list can be of help - or should I ask my 
question on the devel list?

See below for the issue I'm experiencing. I'm aware that December 24 
probably isn't the best day to send a message - I still had hoped for 
more than 0 replies ;-) (Sorry to repost - if that is considered 
unpolite, please feel free to tell me but please do so off-list).

I know I could "strace" smbd and try to find which function stalls the 
thing, and I could also use wireshark; but I am afraid that this will 
get me tons and tons of traffic and logs - the problem manifests itself 
a couple of times per year and I don't have unlimited storage. Plus I 
don't know how to reproduce the problem properly, that is probably the 
biggest problem.

Also, *if* I'm going to packetdump/log/strace everything, I'd rather 
know the best way to proceed, i.e. instead of just maxing out on log 
level, know if there are better options.

Any clues? Pointing in the right direction is also appreciated, I looked 
in the Samba-wiki but did not find anything; and using the info from 
"troubleshooting Samba", ch9 of a 22 year old O'Reilly book seems rather 
futile.

Best regards,

Valentijn

-------- Forwarded Message --------
Subject: smbd linux freeze, not responding to (TERM) signals
Date: Fri, 24 Dec 2021 12:09:26 +0100

Hi,

For a couple of years now, my smbd hangs a couple of times per year: smb 
daemons do not respond to TERM signal, I have to use SIGKILL.

This is in a small network with mostly Apple and a few Linux clients, 
server running Ubuntu Linux, used to be 18.04, now is 20.04.

The users complain "I cannot connect to the server" and the only way to 
resolve is to restart smbd; however, the smbd daemons do not respond to 
TERM signals, I have to KILL them. ("systemctl restart smbd.service" 
will wait for 90s, then kill all smbd-s).

I'll try to give more information below, but I'm sure there is more to 
add - log level or anything. Suggestions welcome.

Whenever the problem occurs, smbstatus shows several "(auth in 
progress)" lines and these SMBds specifically do not listen to any signals:

Samba version 4.13.14-Ubuntu
PID     Username     Group        Machine     Protocol Version 
Encryption           Signing
----------------------------------------------------------------------------------------------------------------------------------------
1696515 (auth in progress)        192.168.103.42 
(ipv4:192.168.103.42:56390) SMB3_11           -                    -
1293711 userie       userie       192.168.102.119 
(ipv4:192.168.102.119:51048) SMB3_11           - partial(AES-128-CMAC)
4165094 userne       userne       192.168.102.153 
(ipv4:192.168.102.153:39456) SMB3_11           - partial(AES-128-CMAC)
259670  userne       userne       192.168.102.153 
(ipv4:192.168.102.153:39936) SMB3_11           - partial(AES-128-CMAC)
1700382 (auth in progress)        192.168.103.42 
(ipv4:192.168.103.42:56400) SMB3_11           -                    -
1711963 (auth in progress)        192.168.103.42 
(ipv4:192.168.103.42:53136) SMB3_11           -                    -
1708107 (auth in progress)        192.168.103.42 
(ipv4:192.168.103.42:53134) SMB3_11           -                    -
1700371 (auth in progress)        192.168.103.42 
(ipv4:192.168.103.42:56396) SMB3_11           -                    -
1657745 userlo       userlo       192.168.103.18 
(ipv4:192.168.103.18:53924) SMB3_11           - partial(AES-128-CMAC)
1696496 (auth in progress)        192.168.103.42 
(ipv4:192.168.103.42:56384) SMB3_11           -                    -
1696495 (auth in progress)        192.168.103.42 
(ipv4:192.168.103.42:56386) SMB3_11           -                    -
1696516 (auth in progress)        192.168.103.42 
(ipv4:192.168.103.42:56392) SMB3_11           -                    -

Service      pid     Machine       Connected at Encryption   Signing
---------------------------------------------------------------------------------------------
IPC$         1293711 192.168.102.119 vr dec 24 09:07:11 2021 CET      - 
            -
shar         1293711 192.168.102.119 vr dec 24 09:07:11 2021 CET      - 
            -
IPC$         1657745 192.168.103.18 vr dec 24 10:41:34 2021 CET      - 
          -
IPC$         1293711 192.168.102.119 vr dec 24 09:07:22 2021 CET      - 
            -
shar         1657745 192.168.103.18 vr dec 24 10:41:33 2021 CET      - 
          -
userie       1293711 192.168.102.119 vr dec 24 09:07:11 2021 CET      - 
            -
shar-shararaties 259670  192.168.102.153 do dec 23 10:37:13 2021 CET   - 
            -
shar         4165094 192.168.102.153 do dec 23 09:23:24 2021 CET      - 
            -

No locked files

In the above exampe, "kill 1696516" doesn't seem to do anything, 1696516 
stays where it is. However if I "kill -KILL" all pids that have "auth in 
progress" for status will make smbd behave correctly (Users: "yes, I can 
connect now").

This situation used to be the same under Ubuntu 18.04 - but as that was 
a rather old smbd, I hoped to fix things with an upgrade. (Yes, I am 
aware of the fact that 4.13.14-Ubuntu is older, too.)

The only difference from a more straight forward setup is probably that 
we run a separate LDAP server for authentication, with passdb backend  = 
ldapsam:ldap://127.0.0.1/
Also, since this is an existing situation that went from upgrade to 
upgrade, I suspect that there will be a few outdated options in smb.conf:

[global]
         log level = 1
         workgroup = shar
         passdb backend  = ldapsam:ldap://127.0.0.1/
         ldap admin dn   = cn=admin,dc=kantoor,dc=shar,dc=nl
         ldap ssl        = off
         ldap suffix     = dc=kantoor,dc=shar,dc=nl
         ldap user suffix        = ou=Users
         ldap group suffix       = ou=Groups
         ldap machine suffix     = ou=Computers
         unix extensions = yes
         delete readonly = yes
         ea support      = yes
         ldap password sync      = yes
         interfaces = 127.0.0.0/8 ens3
         bind interfaces only = true
         load printers = no
         printing = bsd
         printcap name = /dev/null
         disable spoolss = Yes
         disable netbios = yes
         smb ports = 445
         dns proxy = no
         vfs objects = fruit streams_xattr
         security = user

Shares are pretty simple:
[name]
   force group = users
   force directory mode    = 2770
   force create mode       = 0660
   directory mask  = 2770
   create mode     = 0660
   comment         = Comment
   writable        = yes
   path            = /home/somewhere
   mangled names = no
   mangling char = _
   valid users = @users


Oh, trying to find out what the daemon is doing:
strace -p 1700382 (but maybe I'm totally mistaken here and "strace" 
isn't the right tool):
strace: Process 1700382 attached
restart_syscall(<... resuming interrupted read ...>


netstat shows:
tcp        1      0 192.168.102.3:445       192.168.103.42:56400 
CLOSE_WAIT  1700382/smbd
tcp        0      0 127.0.0.1:33010         127.0.0.1:389 ESTABLISHED 
1700382/smbd
unix  2      [ ]         DGRAM                    72953128 1700382/smbd 
         /var/lib/samba/private/msg.sock/1700382

What could cause these hangs?

Best regards,

Valentijn
-- 
http://www.openoffice.nl/   Open Office - Linux Office Solutions
Valentijn Sessink  v.sessink at openoffice.nl  +31(0)20-4214059



More information about the samba mailing list