[Samba] winbind fails

Majeed mabuqu at ilstu.edu
Tue May 18 21:48:11 GMT 2004


I have been having the same problem with winbind for quite a while now 
and have researched up and down, but I can’t get the problem resolved. I 
have dealing with this since 3.0.2. I then moved to 3.0.2a, then to 
3.0.3pre2 since the release notes stated a crash fix when in ads mode, 
then to 3.0.3 since it was a production release and then to 3.0.4 since 
some memory leaks and socket handling issues were fixed in winbind. I 
will now illustrate my problem.

Info:

- 4 windows 2000 domain controllers
- linux box joins the domain and uses Kerberos active directory 
authentication to shares 
- distribution: Gentoo 1.4
- kernel 2.4.26 (stock sources)
- current version of samba: 3.0.4
- If anything else is need please let me know
- configure command to compile:
 ./configure --prefix=/usr --sysconfdir=/etc/samba --localstatedir=/var 
--libdir=/usr/lib/samba
--with-privatedir=/etc/samba/private --with-lockdir=/var/cache/samba 
--with-piddir=/var/run/samba
--with-swatdir=/usr/share/swat --with-configdir=/etc/samba 
--with-logfilebase=/var/log/samba
--enable-static --enable-shared --with-manpages-langs=en 
--without-spinlocks --with-libsmbclient
--with-automount --with-smbmount --with-winbind --with-syslog 
--with-idmap --with-ldap
--with-ads --with-krb5 --with-pam

Problem:

After compiling and installing samba and copying the pam_winbind.so, 
libnss_winbind.so, and libnss_wins.so files to the appropriate 
directories I then start samba and winbind using a startup script. It 
takes about 30sec to a minute for authentication to start working 
(probably winbind talking to the DCs). Once it starts authenticating it 
works GREAT and will continue to do so for a period of 3 days to a week. 
Once it hits a certain point winbind will no longer authenticate. Since 
I have having this problem for a while now, I have been monitoring 
winbindd. It seems that around 3 hours after I start winbindd sockets in 
the CLOSE_WAIT state will start accumulating when I run the netstat 
–antupo command. All the sockets in this state are owned by the winbindd 
process. They will never close unless I kill the winbindd process. Once 
the number of CLOSE_WAITs accumulate up around 1000 it will cause 
winbindd to stop authenticating, samba to crash, and I will not be able 
to ssh in (I can connect, I can authenticate, but after I successfully 
authenticate ssh shoots back a signal 11 error and drops the 
connection). I believe the ssh problem is caused by winbind because of 
all sockets and port numbers it has tied up in the close_wait state. 
Once I restart winbindd and sshd everything works fine again until that 
certain amount of time. After doing much research I found that it is 
usually the application that is not closing the socket correctly, due to 
a bug. At first I thought it might be the kernel so I upgraded from 
2.4.25 to 2.4.26 but the same symptoms came about. After that I was 
reading a developers forum and someone said that if you kill the process 
that owns the sockets in the close_wait state and they disappear then it 
is not a kernel issue. Also during the monitoring of winbindd I noticed 
that amount of memory consumption steadily increases (maybe a leak?). I 
wanted to be able to show the developers and everyone else what I was 
seeing so I wrote a script and tossed in a cronjob to run every hour 10 
minutes after the hour. The script runs the following commands and spits 
the output to a text file. This isn't the entire script but it is the 
meat of it.

LOG_FILE=`date +%F_%H.%M%P_winbind_info.log`
PREFIX=/var/log/winbind/
ps aux | grep PID >> $PREFIX$LOG_FILE
ps aux | grep winb >> $PREFIX$LOG_FILE
ps aux | grep mbd >> $PREFIX$LOG_FILE
cat "/proc/`cat /var/run/samba/winbindd.pid`/status" >> $PREFIX$LOG_FILE
netstat -antupo >> $PREFIX$LOG_FILE

I put the all the logs starting from the minute I started winbindd up 
until now on a webpage for people to see. They are in order by date and 
time and you will be able to see how things progress, memory usage, and 
the close_wait problem. Hopefully the developers can use this 
information. If not it would be great if anyone has any idea on why I 
have all these CLOSE_WAITS. I am replying to a previous post that 
created, but back then I was just going to upgrade to see if I still had 
the same problems. And I did, as you can see. Any insight would be 
great. I would be glad to entertain any questions or tests that people 
would like me to try. I have a test server and a production server and 
this problem happens on both.

Go to www.analoglove.com/winbind <http://www.analoglove.com/winbind>

Below is how the message ended the last time i posted about this.

Thank you very much for you time,
Majeed Qulbain











Majeed wrote:

> Im going to install the new version, and report back in a week or so. 
> Thanks for the reply!
>
> Majeed
>
> Tim Jordan wrote:
>
>> I seen a there  is a fix for winbind crashing in the latest release 
>> notes.
>> http://download.samba.org/samba/ftp/pre/
>> TJ
>> On Mon, 2004-04-05 at 10:25, Majeed wrote:
>>
>>> /I have also been seeing this over the last few weeks. For me it 
>>> also happens randomly as you stated. I am trying to pin point when 
>>> it started, and I believe it started right after I upgraded the 
>>> kernel 2.4.24 to 2.4.25 (vanilla sources on gentoo 1.4) (mremap 
>>> problems), but I can't be too sure. Samba 3.0.2 compiled with the 
>>> following options:
>>> ./configure --prefix=/usr --sysconfdir=/etc/samba 
>>> --localstatedir=/var --libdir=/usr/lib/samba 
>>> --with-privatedir=/etc/samba/private --with-lockdir=/var/cache/samba 
>>> --with-piddir=/var/run/samba --with-swatdir=/usr/share/swat 
>>> --with-configdir=/etc/samba --with-logfilebase=/var/log/samba 
>>> --enable-static --enable-shared --with-manpages-langs=en 
>>> --without-spinlocks --with-libsmbclient --with-automount 
>>> --with-smbmount --with-winbind --with-syslog --with-idmap 
>>> --with-ldap --with-ads --with-krb5 --with-pam
>>>
>>> Here are some symptoms I am seeing when the problem occurs.
>>> Symptom 1)  I cannot login through ssh: Its wierd becuase i can 
>>> connnect, put in my username and password it authenticates but then 
>>> the connection gets reset. There is even a line in the ssh log file 
>>> that says access was granted. I then to to the console and login.
>>>
>>> Symptom 2) While logged into the console I run a "netstat -antu" and 
>>> get some interesting results
>>> tcp        0      0 sambaserv_ip:44134       
>>> win2000dc_ip:139          CLOSE_WAIT
>>> tcp        0      0 sambaserv_ip:44072       
>>> win2000dc_ip:139          CLOSE_WAIT
>>> tcp        0      0 sambaserv_ip:44075       
>>> win2000dc_ip:139          CLOSE_WAIT
>>> tcp        0      0 sambaserv_ip:44076       
>>> win2000dc_ip:139          CLOSE_WAIT
>>> tcp        0      0 sambaserv_ip:44078       
>>> win2000dc_ip:139          CLOSE_WAIT
>>> tcp        0      0 sambaserv_ip:44079       
>>> win2000dc_ip:139          CLOSE_WAIT
>>>
>>> There are HUNDREDS of these CLOSE_WAIT lines all with different 
>>> ascending port numbers
>>> After restarting samba and winbind netstat looked normal and 
>>> everything worked as it should have.
>>>
>>> Symptom 3) While logged into the console I check the samba log files 
>>> and log.winbind showed the following problems.
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>>  open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>>  open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>>  open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>>  open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>>  open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>>  open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>>  open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>>  open_socket_in(): socket() call failed: Too many open files
>>>
>>> Again there were HUNDREDS of these lines.
>>>
>>> So I think winbind might be the cause of the problems. This happens 
>>> on both my production and my test server. Test server is mirrored to 
>>> production for testing.
>>>
>>> Today I am going to download the newest version of the samba 3 and 
>>> see if that helps, if it doesn't then I might try a different kernel 
>>> version. As mentioned before all i do is restart samba and winbind 
>>> and thinks will work perfectly for a random amount of time. Usually 
>>> 3 or more days before it happens again.
>>>
>>> Does anyone have any suggestions? Maybe some different things I 
>>> could look for? Maybe different compile options?
>>>
>>> Thanks
>>> Majeed Qulbain
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Hoskinson, David P wrote:
>>>
>>>> We have a windows 2003 dc here at the university and I have 
>>>> successfully
>>>> setup samba-3.0.2-6.3E on a RHEL WS3 machine. The problem is that 
>>>> after
>>>> several hours, or several days winbind stops running and connections
>>>> fail.  I have seen instances of this on other sites, but no firm
>>>> answers.  I can provide files and logs if helpful
>>>>
>>>>  
>>>> /
>>>
>>>
>



More information about the samba mailing list