Antwort: Re: [Samba] 3.0.4: smbd's + nscd's = 100% CPU; load > 4
Dragan.Krnic at bahn.de
Dragan.Krnic at bahn.de
Wed Jul 7 21:34:38 GMT 2004
Dragan Krnic
DB Fernverkehr AG
P
955 - 7166
____________________________________________________
Internetauftritt der Deutschen Bahn AG >> http://www.bahn.de
Dragan Krnic
07.07.2004 20:50 An:
Hansjoerg.Maurer, jra
Kopie:
samba
Blindkopie:
dkrnic at t-online.de, dkrnic at lycos.com
Thema:
Re: [Samba] 3.0.4: smbd's + nscd's = 100% CPU; load > 4
---------------------------------------------------------------------------------------------------------------------------
As an epilogue, here's a little script to find out if there are
any usernames in /etc/group which don't correspond to an
existing user:
ungroup.awk < /etc/group | sort -r | \
regroup.awk | while read i;\
do id $i > /dev/null;\
done 2>&1 | \
grep -v :x: | sort
Whereby "ungroup.awk" is
#!/bin/awk -f
{ partsNo = split ( $0, partString, ":" );
namesNo = split ( partString[4], userName, "," );
printf ( "%s:%s:%s:\n", partString[1], partString[2], partString[3]
);
for ( i = 1;i <= namesNo; i++ )
printf ( "%s:%s\n", partString[1], userName[i]);
}
and "regroup.awk" is
#!/bin/awk -f
{ partsNo = split ( $0, partString, ":" );
if ( partsNo == 2 )
printf ( "%s\n", partString[2] );
else
printf ( "%s\n", $0 );
}
Feel free to make it better but it did find 5 more users
which are either no more on the books
or their names are ever so slightly different.
This is of course a temporary cure for bad bookkeeping,
until Jeremy finds out why such innocuous causes
have such drastic consequences and fixes it. Or doesn't.
Cheers
>>>>>>> the new 3.0.4 Samba installation seems to work fine
>>>>>>> except that from time to time but at least a couple
>>>>>>> of times a day one or more smbd processes start
>>>>>>> running at 20%-40% CPU each and 6 nscd processes
>>>>>>> then share the remaining CPU power. System 70%-80%
>>>>>>> users the rest 20%-30%. Load rises fast to over 4.
>>>>>>>
>>>>>>> I'm sure that each such process is just idling,
>>>>>>> but why does it engage so much nscd processing?
>>>>>>>
>>>>>>> As soon as I kill the excessive smbd process(es)
>>>>>>> the situation drops to normal, i.e. load < 0,1
>>>>>>> no perceptible CPU%.
>>>>>>>
>>>>>>> Does anyone know what's happening?
>>>>>>>
>>>>>> What does strace say ? Can you attach with gdb to
>>>>>> a CPU bound process and give a backtrace ?
>>>>>>
>>>>> Ah backtrace, to see the steps to the black hole?
>>>>> OK. Will do, sooen as I get back to office (tomorrow).
>>>>>
>>>> A pot watched never boils.
>>>> But as soon as it happens again, I'll consult
>>>> strace and gdb to see how and why it happens.
>>>>
>>>
>>> It does, when you stop watching.
>>> I was away yesterday and what do I see
>>> this morning:
>>>
>>> top - 09:55:35 up 6 days, 17:28, 6 users, load average: 3.59,
3.82, 3.15
>>> Tasks: 182 total, 3 running, 179 sleeping, 0 stopped, 0 zombie
>>> Cpu(s): 21.6% user, 78.4% system, 0.0% nice, 0.0% idle
>>> Mem: 2060704k total, 2004324k used, 56380k free, 185272k
buffers
>>> Swap: 2402296k total, 5236k used, 2397060k free, 1068752k
cached
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ Command
>>> 12808 robf 25 0 2996 2596 2244 R 23.6 0.1 4:46.98 smbd
>>> 12741 robf 22 0 3028 2628 2280 R 20.6 0.1 5:44.55 smbd
>>> 2354 root 15 0 724 716 536 S 15.3 0.0 150:31.23 nscd
>>> 2356 root 15 0 724 716 536 S 14.9 0.0 150:35.87 nscd
>>> 2352 root 15 0 724 716 536 S 9.0 0.0 151:14.08 nscd
>>> 2353 root 15 0 724 716 536 S 6.3 0.0 150:39.89 nscd
>>> 2355 root 15 0 724 716 536 S 6.3 0.0 150:31.82 nscd
>>> 2350 root 15 0 724 716 536 S 3.7 0.0 150:49.78 nscd
>>>
>>> I attached both of the smbd processes to gdb and
>>> backtrace was always:
>>>
>>> #0 0x402e5328 in read () from /lib/libc.so.6
>>> #1 0x40343b90 in __DTOR_END__ () from /lib/libc.so.6
>>> #2 0x4031d58b in __nscd_getpwnam_r () from /lib/libc.so.6
>>> #3 0x402c130d in getpwnam_r@@GLIBC_2.1.2 () from /lib/libc.so.6
>>> #4 0x402c0e6f in getpwnam () from /lib/libc.so.6
>>> #5 0x081298fd in get_memberuids ()
>>> #6 0x08129b09 in _samr_query_groupmem ()
>>> #7 0x081215d1 in api_samr_query_groupmem ()
>>> #8 0x081358b2 in api_rpcTNP ()
>>> #9 0x08135632 in api_pipe_request ()
>>> #10 0x0812fc10 in process_request_pdu ()
>>> #11 0x0812fdec in process_complete_pdu ()
>>> #12 0x08130069 in process_incoming_data ()
>>> #13 0x08130220 in write_to_internal_pipe ()
>>> #14 0x081301a4 in write_to_pipe ()
>>> #15 0x080883e0 in api_fd_reply ()
>>> #16 0x080885b9 in named_pipe ()
>>> #17 0x080891bc in reply_trans ()
>>> #18 0x080c80e0 in switch_message ()
>>> #19 0x080c8172 in construct_reply ()
>>> #20 0x080c8491 in process_smb ()
>>> #21 0x080c9004 in smbd_process ()
>>> #22 0x081f812e in main ()
>>> #23 0x402268ae in __libc_start_main () from /lib/libc.so.6
>>>
>>> After killing both smbd processes with -9 the top soon
>>> stabilizes at:
>>>
>>> top - 10:12:01 up 6 days, 17:45, 6 users, load average: 0.03,
0.19, 1.17
>>> Tasks: 175 total, 2 running, 173 sleeping, 0 stopped, 0 zombie
>>> Cpu(s): 0.0% user, 0.7% system, 0.0% nice, 99.3% idle
>>> Mem: 2060704k total, 2034272k used, 26432k free, 185272k
buffers
>>> Swap: 2402296k total, 5236k used, 2397060k free, 1100020k
cached
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ Command
>>> 15182 root 15 0 956 956 700 R 0.3 0.0 0:03.67 top
>>>
>>> Unfortunately I didn't trace the nscd processes.
>>> What a shame! I'll do it next time.
>>>
>>> Nobody complained yet about reduced performance.
>>>
>>> It's hard to tell when this behaviour started.
>>> The upper bound seems to be 9 hours,
>>> the combined run times of the nscd processe,
>>> some time during the night when the computers
>>> were totally quiet. The lower bound based on
>>> the run times of the smbd processes is more
>>> like half an hour ago.
>>>
>>> This is the fourth out of 5 times that the same user,
>>> "robf", is involved as the effective UID of the smbd process.
>>> The other one time was root's own smbd.
>>>
>>> Jeremy, can I provide more information?
>>
>> I had a similiar Problem , and a loglevel of 4 shows ,
>> that samba was trying to look up a user nobody and a
>> user Administrator, all the time.
>> If I killed nscd the load of the ldap server becomes high...
>>
>> I added these user to my ldap backend, and the problem disappears.
>
> It's a valuable pointer but I'm not sure it really
> applies here, Hans. For one, your problem seems to
> have been persistent up until you added those users
> to your ldap backend. In my case, it happens very
> intermittently. Besides, my passdb backend is the
> default "smbpasswd".
>
> I'll give it a more thorough check next time it
> happens, including user name lookups.
Bingo! It is exactly the same case.
Two user names were spelled out slightly wrong
in the /etc/group. As a consequence,
under certain circumstances the "smbd" process
keeps trying to resolve the name and doesn't
take "no" from "nscd" for an answer.
Each "smbd" process is looping around
these 5 system calls:
1) create a socket,
2) connect to nscd's socket,
3) write the mis-spelled name,
4) read negative answer
5) close socket:
socket(PF_UNIX,SOCK_STREAM,0)=26
connect(26,{sa_family=AF_UNIX,path="/var/run/.nscd_socket"},110)=0
writev(26,[{"\2\0...\0\22\0...",12},{"GeorgeDubbyaBusch\0",18}],2)=30
read(26,"\2\0\0\...\0\377\377\377\377\377\377"...,36)=36
close(26)=0
and the nscds spin like this
poll({fd=3,events=POLLRDNORM,revents=POLLRDNORM}],1,-1)=1
accept(3,,NULL)=9
read(9,"\2\0\0\0\0\0\0\0\22\0\0\0",12)=12
read(9,"GeorgeDubbyaBusch\0",18)=18
write(9,"\2\0\...\0\377\377\377\377\377\377"...,36=36
close(9)=0
Since both mis-spelled names are among the
earliest user names in 2 most frequently used
groups (one is "users"), it's hard to tell
why the smbd processes spin out of control so
infrequently. Jeremy will know more about that.
I'll write a script to check such discrepancies
in the file.
Thanks Hans. Thanks Jeremy.
---------
Diese E-Mail könnte vertrauliche und/oder rechtlich geschützte
Informationen enthalten. Wenn Sie nicht der richtige Adressat sind oder
diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den
Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die
unbefugte Weitergabe dieser Mail sind nicht gestattet.
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorised copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
----------
More information about the samba
mailing list