Hi all:

I've got a couple Ubuntu 9.10 machines that are suffering from a
recurring failure of winbind that essentially crash the machine.  When
the system is in the "crashed state", one can ping the system, but all
forms of login fail.  It will not even respond to tftpd requests; ssh
connections "time out", but the initial port is opened (just no
connect).  Rebooting does NOT recover from this, in order to recover,
I need to:

1) reboot into single user mode
2) edit /etc/nsswitch.conf and remove winbind
3) remove winbind from all pam.d/*
4) boot normally
5) stop samba and winbind
6) delete /var/lib/samba/* and /var/cache/samba/*
7) start samba
8) rejoin doimain
9) start winbind
10) undo #2 and 3 above

After this, winbind will work for a week or two.  If I stop after step
4 above the system is usable, but without domain users able to log in.
 My diagnostics show that net ads users (and all other "samba"
commands) work just fine and find all users.  All winbind-specific
commands (wbinfo -u, etc) fail.  Oh, if I leave the system up in the
crashed state, it begins to fill up logs to the tune of 32gigs in a
few days.  The above procedure repeats approximately once every 5 days
on our main production system.  I have a second workstation that sees
very little use, and it has suffered the same crash, but far less
frequently.  I have also tried inserting step 6.5 where I delete the
machine account on the DC, but that doesn't change anything.  Also,
our Ubuntu 9.04 system running the same configuration files has no
issues.  We have not tried 10.04.

This problem has been plaguing our operations for over two months now,
so any assistance would be greatly appreciated.

Some log file snippits:

(from some point "in the middle" of the crash):
May  7 15:32:45 casas-lin winbindd[20677]:   sys_select: pipe failed
(Too many open files)
May  7 15:32:45 casas-lin winbindd[20677]: [2010/05/07 15:32:45,  0]
May  7 15:32:45 casas-lin winbindd[20677]:   s3_event: sys_select()
failed: 24:Too many open f
May  7 15:32:45 casas-lin winbindd[20677]: [2010/05/07 15:32:45,  0]
May  7 15:32:45 casas-lin winbindd[20677]: [2010/05/07 15:32:45,  0]
May  7 15:32:45 casas-lin winbindd[20677]:   Unable to open new log
file /var/log/samba/log.wb
-CASAS: Too many open files

