[Samba] Samba 4.7.6-ubuntu taking 100% CPU

Olaf Marzocchi lists at marzocchi.net
Fri Apr 2 16:56:47 UTC 2021

Dear all,
I recently incurred in an issue with Samba 4.7.6-ubuntu (Ubuntu 18.04.05 
LTS, with latest updates).
I'm not sure what caused the issue, since at the same time I performed 
updates to Ubuntu (apt upgrade) and I separately updated the zfs kernel 
module to 2.0.4 from whatever version available mid last year, also some 
time passed before I noticed the present issue.

The issue manifests itself as following: when I try to open some files 
(not all of them) from any SMB share (I tried two different Windows 10 
machines and one macOS 10.14), the smb3 process takes 100% CPU, it 
doesn't seem to retrieve the file, and I have to kill it "-9", the hard 
way. I found that, sometimes, after I kill smbd3 Windows gets the file I 
tried to open, but it's not a way that always works: sometimes it's 
corrupted (cut too short).
I tested locally from ssh and the same files have no issues whatsoever, 
it's not a hardware issue. I can read and copy them without problems.

I run influxdb on the same server to track disk and system load and I 
can see that at the same time smb goes crazy the ZFS cache stats jump 
from about 90 read ops/s to about 250K read ops/s (requested, less than 
1 op/s actually goes to disk thanks to caching).

You can find the ZFS cache stats graph here: https://postimg.cc/ns315Kkn

I set logging in smb.conf to level 3 and I deleted all the logs, then I 
restarted smbd and nmbd. I opened a file I know causes the locking and 
after I verified the issue, I killed smbd3 and copied the logs.

You can find the documents here:
smb.conf            https://pastebin.com/SsVBhsLY
log.     https://pastebin.com/uFqLEhVH
log.desktop-eiu2a2q https://pastebin.com/ReKn0qx6
log.nmbd            https://pastebin.com/gMsvyqnB
log.smbd            https://pastebin.com/e3PVjnZu
log.                https://pastebin.com/BjutvvQv

I also paste the output of "ls -l" and "getfacl" on a file which I know 
causes the issue. I don't post the same information for any other file, 
for the simple reason that the attributes are the same... there is no 
apparent difference between in the properties of safe and troubling files.

root at ml110g7# ls -l CI\ 2017.pdf
-rwxrw---- 1 olaf olaf 1570923 Feb 15 21:49 'CI 2017.pdf'

root at ml110g7# getfacl CI\ 2017.pdf
# file: CI 2017.pdf
# owner: olaf
# group: olaf

The ZFS dataset properties which may be relevant are here, the remaining 
one are basically standard:

NAME            PROPERTY                    VALUE
tank/home/olaf  atime                       off
tank/home/olaf  aclmode                     discard
tank/home/olaf  aclinherit                  restricted
tank/home/olaf  xattr                       sa
tank/home/olaf  sharesmb                    off
tank/home/olaf  acltype                     posix
tank/home/olaf  relatime                    off
tank/home/olaf  redundant_metadata          all

Can someone please help me debug the issue?

I hope I provided all the needed information, but I can post more if I 
missed some.

Thanks in advance
Olaf Marzocchi

