winbindd not closing sockets correctly?
Majeed
mabuqu at ilstu.edu
Wed May 19 18:40:05 GMT 2004
Hey developers,
I apologize for sending this message to the technical list, but I
posted the following message to the general list, and now it is
completely buried in the mass of samba mail so I highly doubt anyone
will look at it (maybe I'm wrong).
I am trying to track down a problem with winbindd not closing out
network sockets correctly. I am running a gentoo 1.4 system. I believe
that winbind is the problem, and I havn't found anyone else who has had
a similiar problem. I included much of my research and a link to an
organized log file listing in the original message sent (below).
If anyone could point me to a source file that handles connections
(mainly closing the connections to win 2000 ad controller on tcp 139)
that would cool. Im not sure what i will be able to do with it but I
would like to check it out.
Thanks,
Majeed
original message
*******************************************************************************************************
I have been having the same problem with winbind for quite a while now
and have researched up and down, but I can’t get the problem resolved. I
have dealing with this since 3.0.2. I then moved to 3.0.2a, then to
3.0.3pre2 since the release notes stated a crash fix when in ads mode,
then to 3.0.3 since it was a production release and then to 3.0.4 since
some memory leaks and socket handling issues were fixed in winbind. I
will now illustrate my problem.
Info:
- 4 windows 2000 domain controllers
- linux box joins the domain and uses Kerberos active directory
authentication to shares
- distribution: Gentoo 1.4
- kernel 2.4.26 (stock sources)
- current version of samba: 3.0.4
- If anything else is need please let me know
- configure command to compile:
./configure --prefix=/usr --sysconfdir=/etc/samba --localstatedir=/var
--libdir=/usr/lib/samba
--with-privatedir=/etc/samba/private --with-lockdir=/var/cache/samba
--with-piddir=/var/run/samba
--with-swatdir=/usr/share/swat --with-configdir=/etc/samba
--with-logfilebase=/var/log/samba
--enable-static --enable-shared --with-manpages-langs=en
--without-spinlocks --with-libsmbclient
--with-automount --with-smbmount --with-winbind --with-syslog
--with-idmap --with-ldap
--with-ads --with-krb5 --with-pam
Problem:
After compiling and installing samba and copying the pam_winbind.so,
libnss_winbind.so, and libnss_wins.so files to the appropriate
directories I then start samba and winbind using a startup script. It
takes about 30sec to a minute for authentication to start working
(probably winbind talking to the DCs). Once it starts authenticating it
works GREAT and will continue to do so for a period of 3 days to a week.
Once it hits a certain point winbind will no longer authenticate. Since
I have having this problem for a while now, I have been monitoring
winbindd. It seems that around 3 hours after I start winbindd sockets in
the CLOSE_WAIT state will start accumulating when I run the netstat
–antupo command. All the sockets in this state are owned by the winbindd
process. They will never close unless I kill the winbindd process. Once
the number of CLOSE_WAITs accumulate up around 1000 it will cause
winbindd to stop authenticating, samba to crash, and I will not be able
to ssh in (I can connect, I can authenticate, but after I successfully
authenticate ssh shoots back a signal 11 error and drops the
connection). I believe the ssh problem is caused by winbind because of
all sockets and port numbers it has tied up in the close_wait state.
Once I restart winbindd and sshd everything works fine again until that
certain amount of time. After doing much research I found that it is
usually the application that is not closing the socket correctly, due to
a bug. At first I thought it might be the kernel so I upgraded from
2.4.25 to 2.4.26 but the same symptoms came about. After that I was
reading a developers forum and someone said that if you kill the process
that owns the sockets in the close_wait state and they disappear then it
is not a kernel issue. Also during the monitoring of winbindd I noticed
that amount of memory consumption steadily increases (maybe a leak?). I
wanted to be able to show the developers and everyone else what I was
seeing so I wrote a script and tossed in a cronjob to run every hour 10
minutes after the hour. The script runs the following commands and spits
the output to a text file. This isn't the entire script but it is the
meat of it.
LOG_FILE=`date +%F_%H.%M%P_winbind_info.log`
PREFIX=/var/log/winbind/
ps aux | grep PID >> $PREFIX$LOG_FILE
ps aux | grep winb >> $PREFIX$LOG_FILE
ps aux | grep mbd >> $PREFIX$LOG_FILE
cat "/proc/`cat /var/run/samba/winbindd.pid`/status" >> $PREFIX$LOG_FILE
netstat -antupo >> $PREFIX$LOG_FILE
I put the all the logs starting from the minute I started winbindd up
until now on a webpage for people to see. They are in order by date and
time and you will be able to see how things progress, memory usage, and
the close_wait problem. Hopefully the developers can use this
information. If not it would be great if anyone has any idea on why I
have all these CLOSE_WAITS. I am replying to a previous post that
created, but back then I was just going to upgrade to see if I still had
the same problems. And I did, as you can see. Any insight would be
great. I would be glad to entertain any questions or tests that people
would like me to try. I have a test server and a production server and
this problem happens on both.
Go to www.analoglove.com/winbind
Below is how the message ended the last time i posted about this.
Thank you very much for you time,
Majeed Qulbain
********************************************************************************
Majeed wrote:
> Im going to install the new version, and report back in a week or so.
> Thanks for the reply!
>
> Majeed
>
> Tim Jordan wrote:
>
>> I seen a there is a fix for winbind crashing in the latest release
>> notes.
>> http://download.samba.org/samba/ftp/pre/
>> TJ
>> On Mon, 2004-04-05 at 10:25, Majeed wrote:
>>
>>> /I have also been seeing this over the last few weeks. For me it
>>> also happens randomly as you stated. I am trying to pin point when
>>> it started, and I believe it started right after I upgraded the
>>> kernel 2.4.24 to 2.4.25 (vanilla sources on gentoo 1.4) (mremap
>>> problems), but I can't be too sure. Samba 3.0.2 compiled with the
>>> following options:
>>> ./configure --prefix=/usr --sysconfdir=/etc/samba
>>> --localstatedir=/var --libdir=/usr/lib/samba
>>> --with-privatedir=/etc/samba/private --with-lockdir=/var/cache/samba
>>> --with-piddir=/var/run/samba --with-swatdir=/usr/share/swat
>>> --with-configdir=/etc/samba --with-logfilebase=/var/log/samba
>>> --enable-static --enable-shared --with-manpages-langs=en
>>> --without-spinlocks --with-libsmbclient --with-automount
>>> --with-smbmount --with-winbind --with-syslog --with-idmap
>>> --with-ldap --with-ads --with-krb5 --with-pam
>>>
>>> Here are some symptoms I am seeing when the problem occurs.
>>> Symptom 1) I cannot login through ssh: Its wierd becuase i can
>>> connnect, put in my username and password it authenticates but then
>>> the connection gets reset. There is even a line in the ssh log file
>>> that says access was granted. I then to to the console and login.
>>>
>>> Symptom 2) While logged into the console I run a "netstat -antu" and
>>> get some interesting results
>>> tcp 0 0 sambaserv_ip:44134
>>> win2000dc_ip:139 CLOSE_WAIT
>>> tcp 0 0 sambaserv_ip:44072
>>> win2000dc_ip:139 CLOSE_WAIT
>>> tcp 0 0 sambaserv_ip:44075
>>> win2000dc_ip:139 CLOSE_WAIT
>>> tcp 0 0 sambaserv_ip:44076
>>> win2000dc_ip:139 CLOSE_WAIT
>>> tcp 0 0 sambaserv_ip:44078
>>> win2000dc_ip:139 CLOSE_WAIT
>>> tcp 0 0 sambaserv_ip:44079
>>> win2000dc_ip:139 CLOSE_WAIT
>>>
>>> There are HUNDREDS of these CLOSE_WAIT lines all with different
>>> ascending port numbers
>>> After restarting samba and winbind netstat looked normal and
>>> everything worked as it should have.
>>>
>>> Symptom 3) While logged into the console I check the samba log files
>>> and log.winbind showed the following problems.
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>> open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>> open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>> open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>> open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>> open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>> open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>> open_socket_in(): socket() call failed: Too many open files
>>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
>>> open_socket_in(): socket() call failed: Too many open files
>>>
>>> Again there were HUNDREDS of these lines.
>>>
>>> So I think winbind might be the cause of the problems. This happens
>>> on both my production and my test server. Test server is mirrored to
>>> production for testing.
>>>
>>> Today I am going to download the newest version of the samba 3 and
>>> see if that helps, if it doesn't then I might try a different kernel
>>> version. As mentioned before all i do is restart samba and winbind
>>> and thinks will work perfectly for a random amount of time. Usually
>>> 3 or more days before it happens again.
>>>
>>> Does anyone have any suggestions? Maybe some different things I
>>> could look for? Maybe different compile options?
>>>
>>> Thanks
>>> Majeed Qulbain
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Hoskinson, David P wrote:
>>>
>>>> We have a windows 2003 dc here at the university and I have
>>>> successfully
>>>> setup samba-3.0.2-6.3E on a RHEL WS3 machine. The problem is that
>>>> after
>>>> several hours, or several days winbind stops running and connections
>>>> fail. I have seen instances of this on other sites, but no firm
>>>> answers. I can provide files and logs if helpful
>>>>
>>>>
>>>> /
>>>
>>>
>
More information about the samba-technical
mailing list