Antwort: Re: [Samba] 3.0.4: smbd's + nscd's = 100% CPU; load > 4

Dragan.Krnic at bahn.de Dragan.Krnic at bahn.de
Wed Jul 7 21:34:38 GMT 2004


Dragan Krnic
DB Fernverkehr AG

P
955 - 7166
____________________________________________________
Internetauftritt der Deutschen Bahn AG >> http://www.bahn.de



                                                                                                                             
             Dragan Krnic                                                                                                    
                                                                                                                             
             07.07.2004 20:50                                                                                            An: 
                                                  Hansjoerg.Maurer, jra                                                      
                                                                                                                      Kopie: 
                                                  samba                                                                      
                                                                                                                 Blindkopie: 
                                                  dkrnic at t-online.de, dkrnic at lycos.com                                       
                                                                                                                      Thema: 
                                                  Re: [Samba] 3.0.4: smbd's + nscd's = 100% CPU; load > 4                    
                                                                                                                             
 --------------------------------------------------------------------------------------------------------------------------- 



As an epilogue, here's a little script to find out if there are
any usernames in /etc/group which don't correspond to an
existing user:

   ungroup.awk < /etc/group | sort -r | \
   regroup.awk | while read i;\
   do id $i > /dev/null;\
   done 2>&1 | \
   grep -v :x: | sort

Whereby "ungroup.awk" is

   #!/bin/awk -f
   {  partsNo = split ( $0, partString, ":" );
      namesNo = split ( partString[4], userName, "," );
      printf ( "%s:%s:%s:\n", partString[1], partString[2], partString[3]
);
      for ( i = 1;i <= namesNo; i++ )
            printf ( "%s:%s\n", partString[1], userName[i]);
   }

and "regroup.awk" is

   #!/bin/awk -f
   {  partsNo = split ( $0, partString, ":" );
      if ( partsNo == 2 )
            printf ( "%s\n", partString[2] );
      else
            printf ( "%s\n", $0 );
   }

Feel free to make it better but it did find 5 more users
which are either no more on the books
or their names are ever so slightly different.

This is of course a temporary cure for bad bookkeeping,
until Jeremy finds out why such innocuous causes
have such drastic consequences and fixes it. Or doesn't.

Cheers

>>>>>>> the new 3.0.4 Samba installation seems to work fine
>>>>>>> except that from time to time but at least a couple
>>>>>>> of times a day one or more smbd processes start
>>>>>>> running at 20%-40% CPU each and 6 nscd processes
>>>>>>> then share the remaining CPU power. System 70%-80%
>>>>>>> users the rest 20%-30%. Load rises fast to over 4.
>>>>>>>
>>>>>>> I'm sure that each such process is just idling,
>>>>>>> but why does it engage so much nscd processing?
>>>>>>>
>>>>>>> As soon as I kill the excessive smbd process(es)
>>>>>>> the situation drops to normal, i.e. load < 0,1
>>>>>>> no perceptible CPU%.
>>>>>>>
>>>>>>> Does anyone know what's happening?
>>>>>>>
>>>>>> What does strace say ? Can you attach with gdb to
>>>>>> a CPU bound process and give a backtrace ?
>>>>>>
>>>>> Ah backtrace, to see the steps to the black hole?
>>>>> OK. Will do, sooen as I get back to office (tomorrow).
>>>>>
>>>> A pot watched never boils.
>>>> But as soon as it happens again, I'll consult
>>>> strace and gdb to see how and why it happens.
>>>>
>>>
>>> It does, when you stop watching.
>>> I was away yesterday and what do I see
>>> this morning:
>>>
>>>    top - 09:55:35 up 6 days, 17:28,  6 users,  load average: 3.59,
3.82, 3.15
>>>    Tasks: 182 total,   3 running, 179 sleeping,   0 stopped,   0 zombie
>>>    Cpu(s):  21.6% user,  78.4% system,   0.0% nice,   0.0% idle
>>>    Mem:   2060704k total,  2004324k used,    56380k free,   185272k
buffers
>>>    Swap:  2402296k total,     5236k used,  2397060k free,  1068752k
cached
>>>
>>>      PID USER  PR NI  VIRT  RES  SHR S %CPU %MEM     TIME+   Command
>>>    12808 robf  25  0  2996 2596 2244 R 23.6  0.1   4:46.98   smbd
>>>    12741 robf  22  0  3028 2628 2280 R 20.6  0.1   5:44.55   smbd
>>>     2354 root  15  0   724  716  536 S 15.3  0.0 150:31.23   nscd
>>>     2356 root  15  0   724  716  536 S 14.9  0.0 150:35.87   nscd
>>>     2352 root  15  0   724  716  536 S  9.0  0.0 151:14.08   nscd
>>>     2353 root  15  0   724  716  536 S  6.3  0.0 150:39.89   nscd
>>>     2355 root  15  0   724  716  536 S  6.3  0.0 150:31.82   nscd
>>>     2350 root  15  0   724  716  536 S  3.7  0.0 150:49.78   nscd
>>>
>>> I attached both of the smbd processes to gdb and
>>> backtrace was always:
>>>
>>>    #0  0x402e5328 in read () from /lib/libc.so.6
>>>    #1  0x40343b90 in __DTOR_END__ () from /lib/libc.so.6
>>>    #2  0x4031d58b in __nscd_getpwnam_r () from /lib/libc.so.6
>>>    #3  0x402c130d in getpwnam_r@@GLIBC_2.1.2 () from /lib/libc.so.6
>>>    #4  0x402c0e6f in getpwnam () from /lib/libc.so.6
>>>    #5  0x081298fd in get_memberuids ()
>>>    #6  0x08129b09 in _samr_query_groupmem ()
>>>    #7  0x081215d1 in api_samr_query_groupmem ()
>>>    #8  0x081358b2 in api_rpcTNP ()
>>>    #9  0x08135632 in api_pipe_request ()
>>>    #10 0x0812fc10 in process_request_pdu ()
>>>    #11 0x0812fdec in process_complete_pdu ()
>>>    #12 0x08130069 in process_incoming_data ()
>>>    #13 0x08130220 in write_to_internal_pipe ()
>>>    #14 0x081301a4 in write_to_pipe ()
>>>    #15 0x080883e0 in api_fd_reply ()
>>>    #16 0x080885b9 in named_pipe ()
>>>    #17 0x080891bc in reply_trans ()
>>>    #18 0x080c80e0 in switch_message ()
>>>    #19 0x080c8172 in construct_reply ()
>>>    #20 0x080c8491 in process_smb ()
>>>    #21 0x080c9004 in smbd_process ()
>>>    #22 0x081f812e in main ()
>>>    #23 0x402268ae in __libc_start_main () from /lib/libc.so.6
>>>
>>> After killing both smbd processes with -9 the top soon
>>> stabilizes at:
>>>
>>>    top - 10:12:01 up 6 days, 17:45,  6 users,  load average: 0.03,
0.19, 1.17
>>>    Tasks: 175 total,   2 running, 173 sleeping,   0 stopped,   0 zombie
>>>    Cpu(s):   0.0% user,   0.7% system,   0.0% nice,  99.3% idle
>>>    Mem:   2060704k total,  2034272k used,    26432k free,   185272k
buffers
>>>    Swap:  2402296k total,     5236k used,  2397060k free,  1100020k
cached
>>>
>>>      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ Command
>>>    15182 root      15   0   956  956  700 R  0.3  0.0   0:03.67   top
>>>
>>> Unfortunately I didn't trace the nscd processes.
>>> What a shame! I'll do it next time.
>>>
>>> Nobody complained yet about reduced performance.
>>>
>>> It's hard to tell when this behaviour started.
>>> The upper bound seems to be 9 hours,
>>> the combined run times of the nscd processe,
>>> some time during the night when the computers
>>> were totally quiet. The lower bound based on
>>> the run times of the smbd processes is more
>>> like half an hour ago.
>>>
>>> This is the fourth out of 5 times that the same user,
>>> "robf", is involved as the effective UID of the smbd process.
>>> The other one time was root's own smbd.
>>>
>>> Jeremy, can I provide more information?
>>
>> I had a similiar Problem , and a loglevel of 4 shows ,
>> that samba was trying to look up a user nobody and a
>> user Administrator, all the time.
>> If I killed nscd the load of the ldap server becomes high...
>>
>> I added these user to my ldap backend, and the problem disappears.
>
> It's a valuable pointer but I'm not sure it really
> applies here, Hans. For one, your problem seems to
> have been persistent up until you added those users
> to your ldap backend. In my case, it happens very
> intermittently. Besides, my passdb backend is the
> default "smbpasswd".
>
> I'll give it a more thorough check next time it
> happens, including user name lookups.

Bingo! It is exactly the same case.
Two user names were spelled out slightly wrong
in the /etc/group. As a consequence,
under certain circumstances the "smbd" process
keeps trying to resolve the name and doesn't
take "no" from "nscd" for an answer.
Each "smbd" process is looping around
these 5 system calls:
1) create a socket,
2) connect to nscd's socket,
3) write the mis-spelled name,
4) read negative answer
5) close socket:

   socket(PF_UNIX,SOCK_STREAM,0)=26
   connect(26,{sa_family=AF_UNIX,path="/var/run/.nscd_socket"},110)=0
   writev(26,[{"\2\0...\0\22\0...",12},{"GeorgeDubbyaBusch\0",18}],2)=30
   read(26,"\2\0\0\...\0\377\377\377\377\377\377"...,36)=36
   close(26)=0

and the nscds spin like this

   poll({fd=3,events=POLLRDNORM,revents=POLLRDNORM}],1,-1)=1
   accept(3,,NULL)=9
   read(9,"\2\0\0\0\0\0\0\0\22\0\0\0",12)=12
   read(9,"GeorgeDubbyaBusch\0",18)=18
   write(9,"\2\0\...\0\377\377\377\377\377\377"...,36=36
   close(9)=0

Since both mis-spelled names are among the
earliest user names in 2 most frequently used
groups (one is "users"), it's hard to tell
why the smbd processes spin out of control so
infrequently. Jeremy will know more about that.

I'll write a script to check such discrepancies
in the file.

Thanks Hans. Thanks Jeremy.




---------

Diese E-Mail könnte vertrauliche und/oder rechtlich geschützte
Informationen enthalten. Wenn Sie nicht der richtige Adressat sind oder
diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den
Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die
unbefugte Weitergabe dieser Mail sind nicht gestattet.

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorised copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

----------




More information about the samba mailing list