[Samba] Samba46 Listen queue overflow in FreeBSD 11.1
peter at ifm.liu.se
Tue Jan 16 21:56:32 UTC 2018
There are a number of rather low listen() queue limits in Samba which we also ran into on our pretty busy (around 300-500 users/server) Samba servers, also on FreeBSD 11.1 (six servers with 256GB RAM, 2x10Gbit ethernet, 140TB of storage).
Please find enclosed a patch we use to up the limits (quite a bit). The patch makes it possible to control the queue limit via the config file using a “socket listen backlog” parameter. We set it to 1024 on our servers. With the default (10) things go wrong too often. (For example when the Microsoft antivirus software decides to scan the profile folders on all the student computers at the same time and trigger new SMB connections to the servers (probably due to links pointing to shared folders or something - talk about DOS attacks.. :-).
With the default listen() backlog limits we saw that the internal connections between the smbd and winbindd processes sometimes overran the queue and then things started behaving really badly.
We’ve also seen some other things that cause problems - like the gencache.tdb file growing without bounds. When it have somewhere around like 500,000 records things slow down quite a bit too.
So we restart our smbd servers every morning at 7am and delete that file at the same time.
Another thing you definitely want to disable (if you have many users and/or many files/folders) is the inotify support - add a "kernel change notify = false” to smb.conf. On huge folders it runs out of kernel kqueue resources….
[Lı.U] System Administrator ITI-NET IT.LiU.SE +46-13-28 2786
> On 16 Jan 2018, at 20:08, Wallace Barrow via samba <samba at lists.samba.org> wrote:
> Hello everyone,
> We are trying to track down some samba issues and wondering there are some
> settings we can tweak.
> We have a new Supermicro server running the following with 192GB of RAM, 32
> active CPUs and 54TB of usable zfs mirrors (raid10).
> uname -a
> FreeBSD hostname 11.1-RELEASE-p4 FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 06:12:40 UTC 2017
> root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
> [root at hostname /var/log]# freebsd-version -k
> [root at hostname /var/log]# freebsd-version -u
> Samba version = samba46-4.6.8
> This server is hosting all data in ZFS pools. This server gives out NFS shares
> to other servers and holds the end user shares for Linux and Windows.
> When the server is up for about 24 hours we start to see these errors in
> /var/log/messages. The occurrence number could be more.
> kernel: sonewconn: pcb 0xfffff8018a4ba3c0: Listen queue overflow: 8 already in
> queue awaiting acceptance (1 occurrences)
> The load increases on the server by ~3 and more CPU is also used right after the
> first error. After more time has past we slowly see more queue errors, maybe a
> few every hour. After a few days the load reaches 20 and each CPU is running at
> 50%. It never goes above that.
> Some info about that error:
> [root at hostname /var/log]# netstat -anA | grep -i fffff8018a4ba3c0
> fffff8018a4ba3c0 stream 0 0 fffff8016bc27938 0 0 0 /var/run/samba4/winbindd/pipe
> [root at hostname /var/log]# sockstat -l | grep -i /var/run/samba4/winbindd/pipe
> root winbindd 1101 22 stream /var/run/samba4/winbindd/pipe
> root winbindd 1100 22 stream /var/run/samba4/winbindd/pipe
> root winbindd 1099 22 stream /var/run/samba4/winbindd/pipe
> root winbindd 962 22 stream /var/run/samba4/winbindd/pipe
> root winbindd 954 22 stream /var/run/samba4/winbindd/pipe
> We have increased this kernel tuneable to --- kern.ipc.soacceptqueue=4096
> The server is using the on board 10gig NIC which is know for a bug and a
> different error. We turned off lso and tso to fix that error -
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221919#c9 (my post in comment
> There have been other posts on the web about the "Listen queue overflow" errors
> and having the NIC be the problem so for fun we installed a 1GB Intel PCI NIC
> card just as a test with no luck, the errors came back after a day or so.
> With all these errors in the logs you still can browse the shares in Windows or
> Linux by \\sharename but when you login after a reboot or a log out the network
> scripts time out mapping the drives so they don't appear to the end user, some
> times if you are lucky you get the script to map the drives. Before there are
> any errors about the queue, no issues.
> Are there more kernel limits we should be adding? Can we adjust Samaba to accept
> more connections or tune something there? After a 'service restart samba_server'
> the load of the server goes back to normal, around 5, shares can be mapped by
> scripts and the server is good for another 24 hours.
> To unsubscribe from this list go to the following URL and read the
> instructions: https://lists.samba.org/mailman/options/samba
More information about the samba