Run away number of smbd children

Thu Oct 11 16:25:29 GMT 2007

Kevin:

I know what you mean... and although that problem is serious, I am not
convinced it's your only problem.  Were your users actually able to
logon?

One of our customers ran into a problem that on the surface behaved like
Bug 3204 with our customized build of 3.0.23b.  In this case, we were
seeing millions of SID -> GID lookups being performed by winbind and the
200 SMBD -> WINBIND connection limit was being exceeded because each
user authentication, with the necessary group membership lookups was
taking too long.  

We were able to work around this problem by not loading winbind and
letting our own NSS module resolve the group memberships - but of course
then you run into the NSS solaris group membership limits (only 16
allowed).  

I was looking for this problem in your log, and although there was a
fair amount of IDMAP related references, it did not appear to be the
case that your users belonged to many groups; mostly the same SIDs were
being resolved over and over.

We are currently targeting 3.0.26a for a new release with the hope that
the improved IDMAP caching and bulk resolving functions will help, so we
have a vested interest in making sure this problem is resolved :)

I wonder if we can use the Samba 4 SMB torture to simulate the
situation.

Dave Daugherty
Centrify

> On Behalf Of Kevin Robinson
> Sent: Thursday, October 11, 2007 6:19 AM

> That is a clue as the older version isn't doing that; however, I did 
> rejoin the domain as part of the reversion process...  I'll need to
find 
> a way to test this theory out without affecting the whole campus first
lol

> Thanks

> Dave Daugherty wrote:
>>From log.wb-GACL
>> 
>>   winbindd_pam_auth: sam_logon returned ACCESS_DENIED.  Maybe the
trust
>> account password was changed and we didn't know it. Killing
connections
>> to domain GACL
>> 
>> occurs every few seconds.  Did you try to rejoin the domain to fix
this?
>> 
>> Dave Daugherty
> 
> 
> On Behalf Of Kevin Robinson
> Sent: Wednesday, October 10, 2007 12:52 PM
> 
>> Hey,
> 
>> This is the same problem I was having with version 3.0.25b -- which
is
> 
>> why I upgraded to 3.0.26a.  Yesterday it got so bad that I was having
> to 
>> restart it every few minutes, or the machine would get hosed.  I
> finally 
>> reverted to 3.0.20b and haven't had a problem since; however, the
> server 
>> hasn't hit the load it was at either so who knows...
> 
>> Kevin Robinson wrote:
>>> Thanks for the help,
>>>
>>> Seems that this could be related to 
>>> https://bugzilla.samba.org/show_bug.cgi?id=3204
>>>
>>> , so I've put up a page with all the loggin' data plus some ( 
>>> http://comp.uark.edu/~kevinr/samba/ ).  The server's patch level is 
>>> behind the current which is 13.  As I was watching the server via 
>>> prstat, I noticed that the smbd processes would be using very much
> of 
>>> the cpu; but that could be because there were 5000+ processes
> running.
>>> I was not able to get the snoop data though :(
>>>
> 

-- 
Kevin Robinson, B.Sc.
SysAdmin for University of Arkansas IT Services
(479)575-2901-office, (479)575-4753-fax
Never take life seriously.  Nobody gets out alive anyway.

01101011 01100101 01110110 01111001 00100000 01100100