[Samba] Samba/Winbind slow with Active Directory

Hoogstraten, Ton Ton.Hoogstraten at ingram.nl
Sun Jun 3 14:08:06 GMT 2007


Hi all,
 
I'm looking for answers regarding a problem I'm having with Samba. Since
a year our Samba fileserver is part of out worldwide corporate Active
Directory. Before that Samba was part of our local NT4 domain. Since the
change to Active Directory the Samba server became slower and sometimes
does not respond at all to share requests.
 
I need to find a solution to reduce the time required to access the
Samba server on first connect. Even more import for me is to stop the
problem of the Samba fileserver sometimes being not available for more
then 3 hours (currently it looks like a problem with the total amount of
user access requests to winbind at a given time). I'm based in the
Netherlands working for a company with a worldwide Active Directory
setup. I cannot change Active Directory settings, so I need to approach
this from the Samba side to find the problem(s).
 
I have 3 Samba fileservers. All servers are responding slow on first
connects. The server in the main office is the one that sometimes does
not respond at all. All servers use an LDAP backend installed for
uid/gid mappings on the main office server (replications for backups on
the others). When we migrated to Active directory I faced a problem that
the sid history the migration team told us would help avoid problems
with the servers being in the old domain and the users in the Active
Directory did not work for Samba. To work around this I setup the LDAP
backend config and matched all Active directory sids for the Netherlands
users to point to exact the same UID/GID as the NT4 sids. This solved my
sid history problem back then. After all users got migrated to Active
Directory the server was migrated to avoid further problems. The old sid
entries for NT4 are still in the ldap database. (If this may be part of
the problem I can remove them) Rough estimated user access is as
followed:
 
Main office: 250 Users (including some from the other 2 sites)
Warehouse: 150 Users
Small Office: 15 Users
 
All sites have a local AD server. Samba is configured to use only the
server on that site. all servers are running on Redhat ES 4 and are
Samba version 3.0.23d. All samba upgrades so far helped in certain
areas, but the problems in general remained (slow reponse on first
connect, and long outages for the main office server).
 
A recent unexpected event revealed something of the problem. Last
thursday a minor power glitch in the main office caused a lot of
workstations and workfloor switches to restart. The server room was not
infected being on an UPS. When clients started to log on again the Samba
server was not responding to share access requests. some users managed
to access a share finding that 4 minutes later they cannot access the
share again. The Samba server itself is a HP Proliant DL380 server with
2 3Ghz HT CPU's and 8Gb of RAM. The server had no load problems. Very
few smbd daemons claimed CPU time (few users could get on). The winbind
daemon claimed the most CPU time, but was not putting any load on the
system. The server remained in this state for about 3 hours. After that
it returned to slower access to the shares compared to our NT4 domain
but working "normal" according to current behaviour. Our best test to
know if the server is working normally again is if the Nagios monitor
plugin can access shares again. It uses the linux smbclient and is less
tolerant in the time it waits to access a share.
 
I can stop and start all services during the outage making no real
difference. A reboot of the server or the workstations makes no
difference. I've added a share with guest access during the outage which
the smbclient can access very quickly anonymous. If I add an AD user and
provide a password the share will time out as well. Windows XP systems
have the same response. during the outage most of them cannot access the
guest share and during normal response (but slow) they can access it.
 
Last friday I did some tests with a test server monitoring the network
traffic. My test pattern was as followed:
 
-start Samba
-start the network tcpdump on the Samba server
-connect from my windows XP box by using: net use \\server\share
<file://server/share> 
-about 1 minute later the XP box reports succesful.
 
If I then check the tcpdump capture with Wireshark I noticed the
following. all packets from and to the server are answered fast. I can
see the client connecting to the \\server\IPC$ <file://server/IPC$>  at
normal speed. at a certain stage I see the client sending a 'Session
Setup AndX Request' packet. In this packet I see (what I think is the
purpose of this packet) a SPNEGO and Kerberos ticket AP-REQ. Then the
client is not sending any packets for a long time. Eventually the Samba
server comes with an answer to the 'Session Setup AndX Request' packet.
If I check the SMB Header for this packet I see that it is the answer to
the client packet and it toke the request 52.48116500 seconds to
respond! I've noticed no specific packets indicating the server talking
to the AD server during those 52 seconds.
 
The response packet includes a SPNEGO section which reports that the
negResult is accept-completed. The supportedMech is returned as 'MS KRB5
- Microsoft Kerberos 5' After this packet the rest of the communication
is handled quickly and the Windows XP reports success on the share. The
whole process of getting to the share required 1 minute which is not
acceptable for LAN access. Waiting a few minutes and trying again is
fast. Only the first connect is slow.
 
I managed to get some more test results this weekend. I've started the
winbind daemon in interactive mode with a debug level of 3. When I then
connect my windows XP system I see that the winbind daemon is doing a
lot of sid to gid lookups. I've counted the lookups being 85 different
sids. If I check the count for which groups my user is a member off it
adds up to 59 for active directory (I believe some sids are sid history
sids taken from the old domain). It takes the winbind daemon a long time
to go through all those 85 sids (should be the same time required
compared to the session setup packet response). when winbind stops the
lookups I get the message 'succesful' from the net use command. A second
run comes back successfull right away for the net use command. I think I
figured that one out being the smbd daemon assigned to my session. If I
kill the smbd process and connect again from my Windows XP box winbind
again goes through all sids and after a minute or more reports success.
 
Can it be a problem with the LDAP backend I'm using? When I whipe the
database on the test server, clearing about 8000 to 9000 entries winbind
is responding much faster with the sid lookups. I don't want to lose all
my mappings, but I could try to clear some old entries from the old NT
domain.
 
It is also worth to mention that during a system performance consultancy
a couple of months ago the Redhat ES 4 configuration was changed by
increasing the number of open files descriptors in the limits.conf. by
default this was set 1024 now configured as 32768. It turned out that
the samba processes (especially winbind) opened a lot of file sockets.
Reports revealed that the system had an iowait % which was to high
before the change was made. This solved a large part of the returning
outages we experienced. But now it appears to be back in a different
form.
 
Any help and pointers in the right direction to resolve this is
appreciated.
 
I'm currently testing with Samba 3.0.25a. I'm having a problem with the
winbind daemon crashing a lot. doing an 'ls -la' in a directory with
Active Directory groups assigned to directories is enough to crash it. I
did a bug report on bugzilla.
 
url: https://bugzilla.samba.org/show_bug.cgi?id=4667
 
Kind regards,
 
Ton Hoogstraten
 
The Main site server config (note: company specific values changed
between <>)
 
[global]
        workgroup = <workgroup>
        realm = <company realm>
        server string = Samba Fileserver
        security = ADS
        client schannel = No
        password server = <AD server1> <AD server2>
        restrict anonymous = 2
        log file = /var/log/samba/samba.log
        max log size = 150
        large readwrite = No
        name resolve order = host wins bcast
        time server = Yes
        server signing = auto
        client use spnego = No
        socket options = TCP_NODELAY IPTOS_LOWDELAY SO_KEEPALIVE
SO_RCVBUF=8192 SO_SNDBUF=8192
        printcap name = /etc/printcap
        preferred master = No
        local master = No
        domain master = No
        dns proxy = No
        wins server = <wins server1> <wins server 2>
        ldap admin dn = cn=<y>,dc=<x>,dc=<z>
        ldap idmap suffix = ou=Idmap
        ldap suffix = dc=<x>,dc=<z>
        ldap ssl = no
        idmap backend = ldap:ldap://127.0.0.1
        idmap uid = 10000-2000000
        idmap gid = 10000-2000000
        template homedir = /home/%U
        winbind use default domain = Yes

 


More information about the samba mailing list