winbindd not closing sockets correctly?

Majeed mabuqu at ilstu.edu
Wed May 19 18:40:05 GMT 2004


Hey developers,

	I apologize for sending this message to the technical list, but I 
posted the following message to the general list, and now it is 
completely buried in the mass of samba mail so I highly doubt anyone 
will look at it (maybe I'm wrong).

I am trying to track down a problem with winbindd not closing out 
network sockets correctly. I am running a gentoo 1.4 system. I believe 
that winbind is the problem, and I havn't found anyone else who has had 
a similiar problem. I included much of my research and a link to an 
organized log file listing in the original message sent (below).

If anyone could point me to a source file that handles connections 
(mainly closing the connections to win 2000 ad controller on tcp 139) 
that would cool. Im not sure what i will be able to do with it but I 
would like to check it out.

Thanks,
Majeed



original message
*******************************************************************************************************

I have been having the same problem with winbind for quite a while now
and have researched up and down, but I can’t get the problem resolved. I
have dealing with this since 3.0.2. I then moved to 3.0.2a, then to
3.0.3pre2 since the release notes stated a crash fix when in ads mode,
then to 3.0.3 since it was a production release and then to 3.0.4 since
some memory leaks and socket handling issues were fixed in winbind. I
will now illustrate my problem.

Info:

- 4 windows 2000 domain controllers
- linux box joins the domain and uses Kerberos active directory
authentication to shares
- distribution: Gentoo 1.4
- kernel 2.4.26 (stock sources)
- current version of samba: 3.0.4
- If anything else is need please let me know
- configure command to compile:
./configure --prefix=/usr --sysconfdir=/etc/samba --localstatedir=/var
--libdir=/usr/lib/samba
--with-privatedir=/etc/samba/private --with-lockdir=/var/cache/samba
--with-piddir=/var/run/samba
--with-swatdir=/usr/share/swat --with-configdir=/etc/samba
--with-logfilebase=/var/log/samba
--enable-static --enable-shared --with-manpages-langs=en
--without-spinlocks --with-libsmbclient
--with-automount --with-smbmount --with-winbind --with-syslog
--with-idmap --with-ldap
--with-ads --with-krb5 --with-pam

Problem:

After compiling and installing samba and copying the pam_winbind.so,
libnss_winbind.so, and libnss_wins.so files to the appropriate
directories I then start samba and winbind using a startup script. It
takes about 30sec to a minute for authentication to start working
(probably winbind talking to the DCs). Once it starts authenticating it
works GREAT and will continue to do so for a period of 3 days to a week.
Once it hits a certain point winbind will no longer authenticate. Since
I have having this problem for a while now, I have been monitoring
winbindd. It seems that around 3 hours after I start winbindd sockets in
the CLOSE_WAIT state will start accumulating when I run the netstat
–antupo command. All the sockets in this state are owned by the winbindd
process. They will never close unless I kill the winbindd process. Once
the number of CLOSE_WAITs accumulate up around 1000 it will cause
winbindd to stop authenticating, samba to crash, and I will not be able
to ssh in (I can connect, I can authenticate, but after I successfully
authenticate ssh shoots back a signal 11 error and drops the
connection). I believe the ssh problem is caused by winbind because of
all sockets and port numbers it has tied up in the close_wait state.
Once I restart winbindd and sshd everything works fine again until that
certain amount of time. After doing much research I found that it is
usually the application that is not closing the socket correctly, due to
a bug. At first I thought it might be the kernel so I upgraded from
2.4.25 to 2.4.26 but the same symptoms came about. After that I was
reading a developers forum and someone said that if you kill the process
that owns the sockets in the close_wait state and they disappear then it
is not a kernel issue. Also during the monitoring of winbindd I noticed
that amount of memory consumption steadily increases (maybe a leak?). I
wanted to be able to show the developers and everyone else what I was
seeing so I wrote a script and tossed in a cronjob to run every hour 10
minutes after the hour. The script runs the following commands and spits
the output to a text file. This isn't the entire script but it is the
meat of it.

LOG_FILE=`date +%F_%H.%M%P_winbind_info.log`
PREFIX=/var/log/winbind/
ps aux | grep PID >> $PREFIX$LOG_FILE
ps aux | grep winb >> $PREFIX$LOG_FILE
ps aux | grep mbd >> $PREFIX$LOG_FILE
cat "/proc/`cat /var/run/samba/winbindd.pid`/status" >> $PREFIX$LOG_FILE
netstat -antupo >> $PREFIX$LOG_FILE

I put the all the logs starting from the minute I started winbindd up
until now on a webpage for people to see. They are in order by date and
time and you will be able to see how things progress, memory usage, and
the close_wait problem. Hopefully the developers can use this
information. If not it would be great if anyone has any idea on why I
have all these CLOSE_WAITS. I am replying to a previous post that
created, but back then I was just going to upgrade to see if I still had
the same problems. And I did, as you can see. Any insight would be
great. I would be glad to entertain any questions or tests that people
would like me to try. I have a test server and a production server and
this problem happens on both.

Go to www.analoglove.com/winbind
Below is how the message ended the last time i posted about this.

Thank you very much for you time,
Majeed Qulbain



********************************************************************************







Majeed wrote:

 > Im going to install the new version, and report back in a week or so.
 > Thanks for the reply!
 >
 > Majeed
 >
 > Tim Jordan wrote:
 >
 >> I seen a there  is a fix for winbind crashing in the latest release
 >> notes.
 >> http://download.samba.org/samba/ftp/pre/
 >> TJ
 >> On Mon, 2004-04-05 at 10:25, Majeed wrote:
 >>
 >>> /I have also been seeing this over the last few weeks. For me it
 >>> also happens randomly as you stated. I am trying to pin point when
 >>> it started, and I believe it started right after I upgraded the
 >>> kernel 2.4.24 to 2.4.25 (vanilla sources on gentoo 1.4) (mremap
 >>> problems), but I can't be too sure. Samba 3.0.2 compiled with the
 >>> following options:
 >>> ./configure --prefix=/usr --sysconfdir=/etc/samba
 >>> --localstatedir=/var --libdir=/usr/lib/samba
 >>> --with-privatedir=/etc/samba/private --with-lockdir=/var/cache/samba
 >>> --with-piddir=/var/run/samba --with-swatdir=/usr/share/swat
 >>> --with-configdir=/etc/samba --with-logfilebase=/var/log/samba
 >>> --enable-static --enable-shared --with-manpages-langs=en
 >>> --without-spinlocks --with-libsmbclient --with-automount
 >>> --with-smbmount --with-winbind --with-syslog --with-idmap
 >>> --with-ldap --with-ads --with-krb5 --with-pam
 >>>
 >>> Here are some symptoms I am seeing when the problem occurs.
 >>> Symptom 1)  I cannot login through ssh: Its wierd becuase i can
 >>> connnect, put in my username and password it authenticates but then
 >>> the connection gets reset. There is even a line in the ssh log file
 >>> that says access was granted. I then to to the console and login.
 >>>
 >>> Symptom 2) While logged into the console I run a "netstat -antu" and
 >>> get some interesting results
 >>> tcp        0      0 sambaserv_ip:44134
 >>> win2000dc_ip:139          CLOSE_WAIT
 >>> tcp        0      0 sambaserv_ip:44072
 >>> win2000dc_ip:139          CLOSE_WAIT
 >>> tcp        0      0 sambaserv_ip:44075
 >>> win2000dc_ip:139          CLOSE_WAIT
 >>> tcp        0      0 sambaserv_ip:44076
 >>> win2000dc_ip:139          CLOSE_WAIT
 >>> tcp        0      0 sambaserv_ip:44078
 >>> win2000dc_ip:139          CLOSE_WAIT
 >>> tcp        0      0 sambaserv_ip:44079
 >>> win2000dc_ip:139          CLOSE_WAIT
 >>>
 >>> There are HUNDREDS of these CLOSE_WAIT lines all with different
 >>> ascending port numbers
 >>> After restarting samba and winbind netstat looked normal and
 >>> everything worked as it should have.
 >>>
 >>> Symptom 3) While logged into the console I check the samba log files
 >>> and log.winbind showed the following problems.
 >>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
 >>>  open_socket_in(): socket() call failed: Too many open files
 >>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
 >>>  open_socket_in(): socket() call failed: Too many open files
 >>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
 >>>  open_socket_in(): socket() call failed: Too many open files
 >>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
 >>>  open_socket_in(): socket() call failed: Too many open files
 >>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
 >>>  open_socket_in(): socket() call failed: Too many open files
 >>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
 >>>  open_socket_in(): socket() call failed: Too many open files
 >>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
 >>>  open_socket_in(): socket() call failed: Too many open files
 >>> [2004/04/05 10:11:05, 0] lib/util_sock.c:open_socket_in(634)
 >>>  open_socket_in(): socket() call failed: Too many open files
 >>>
 >>> Again there were HUNDREDS of these lines.
 >>>
 >>> So I think winbind might be the cause of the problems. This happens
 >>> on both my production and my test server. Test server is mirrored to
 >>> production for testing.
 >>>
 >>> Today I am going to download the newest version of the samba 3 and
 >>> see if that helps, if it doesn't then I might try a different kernel
 >>> version. As mentioned before all i do is restart samba and winbind
 >>> and thinks will work perfectly for a random amount of time. Usually
 >>> 3 or more days before it happens again.
 >>>
 >>> Does anyone have any suggestions? Maybe some different things I
 >>> could look for? Maybe different compile options?
 >>>
 >>> Thanks
 >>> Majeed Qulbain
 >>>
 >>>
 >>>
 >>>
 >>>
 >>>
 >>>
 >>>
 >>> Hoskinson, David P wrote:
 >>>
 >>>> We have a windows 2003 dc here at the university and I have
 >>>> successfully
 >>>> setup samba-3.0.2-6.3E on a RHEL WS3 machine. The problem is that
 >>>> after
 >>>> several hours, or several days winbind stops running and connections
 >>>> fail.  I have seen instances of this on other sites, but no firm
 >>>> answers.  I can provide files and logs if helpful
 >>>>
 >>>>
 >>>> /
 >>>
 >>>
 >




More information about the samba-technical mailing list