[Samba] The "10 hour problem"

Peter Eriksson peter at ifm.liu.se
Thu Nov 9 17:35:27 UTC 2017


We have a strange bug that happens at 10 hour intervals (after restarting the Samba processes) on our fileservers (serving around 200 concurrent users). 
After exactly 10 hours we often see about a 1 minute “hickup” where the Samba daemons will refuse new client connections. And then things will run smoothly again.

My guess is that this is due to the 10 hour lifespan of the Kerberos service tickets from the AD servers and that things take time when Samba has to get a new one, but can something be done to fix this? Or atleast get a much shorter “downtime”?

(https://technet.microsoft.com/en-us/library/jj852188(v=ws.11).aspx <https://technet.microsoft.com/en-us/library/jj852188(v=ws.11).aspx> - 600 minutes = 10 hours)

System:

6 Microsoft Windows AD servers (latest version I think)
6 FreeBSD 11.1 servers running Samba 4.7.0 that each bind to one of the AD servers
Around 200 (at peak hour) concurrent SMB users.

I’ve been trying to debug this issue for some time now, and have been trying to tweak the smb.conf settings in order to find a combination that works better, but I’m running out of ideas.

smb.conf (parts of it)

security = ADS
kerberos method = system keytab
winbind use default domain = yes
winbind max clients = 1000
winbind max domain connections = 5
min protocol = SMB2
smb encrypt = auto
client ldap sasl wrapping = seal


I’m also wondering a bit about the “winbind max domain connections”. I found some Microsoft technet article that talks about “MaxConnectionsPerUser” that by default is limited to 5. Are these settings related and thus one shouldn’t go much above 5?

(We have around 100k users in the AD systems and many groups so Winbind takes some time at startup….)

—
[Lı.U] System Administrator ITI-NET IT.LiU.SE +46-13-28 2786



More information about the samba mailing list