Winbindd in HEAD: cannot deal with 15,000 users+

Mike Papper mike at digitalpipe.net
Tue Apr 2 12:25:06 GMT 2002


I was running winbind in the debugger, accessing the 15,000 users fromt he 
PDC. Heres what I got (notice, the smb_panic function was not called):

Program received signal SIGSEGV, Segmentation fault.
0x0806bf4b in centry_put_string () at eval.c:41
41      eval.c: No such file or directory.
        in eval.c
(gdb) bt
#0  0x0806bf4b in centry_put_string () at eval.c:41
#1  0x0806cd42 in lookup_groupmem () at eval.c:41
#2  0x08068695 in fill_grent_mem () at eval.c:41
#3  0x080699ac in winbindd_getgrent () at eval.c:41
#4  0x080663dd in process_request () at eval.c:41
#5  0x0806664e in process_packet () at eval.c:41
#6  0x08066c36 in process_loop () at eval.c:41
#7  0x08067151 in main () at eval.c:41
#8  0x40167507 in __libc_start_main (main=0x8066d10 <main>, argc=4,
    ubp_av=0xbffff854, init=0x8064e00 <_init>, fini=0x80fb980 <_fini>,
    rtld_fini=0x4000dc14 <_dl_fini>, stack_end=0xbffff84c)
    at ../sysdeps/generic/libc-start.c:129
(gdb)

What I suspect is that since the PDC I amusing is rapidly adding and removing 
users/groups that between a call to get users and get groups, the set of 
users in the groups dissappears.

Said another way...I think winbind gets a list of users in a group. Then is 
does "something" that shows up that a certain user is not a valid user and it 
crashes instead of dealing with this race condition.

--------------
In the PDC I run a script that adds 2 users, adds a group and adds the users 
to the group. Then I wait 9 seconds and remove the users then remove the 
group. Then I wait 9 more seconds and start all over. Here is the script 
written in bash and run from cygwin on windows NT 4:
------------------------------
while true; do
net group "g1" /ADD /COMMENT:"G 1"
net user u1 u1 /ADD /FULLNAME:"U 1" /COMMENT:"Test 1"
net group "g1" u1 /ADD

net user u2 u2 /ADD /FULLNAME:"U 2" /COMMENT:"Test 2"
net group "g1" u2 /ADD

sleep 9;

net user u1 /DELETE
net user u2 /DELETE
net group "g1" /DELETE

sleep 9;
done
------------------------------
I am wondering if the line:
net user u1 /DELETE
net user u2 /DELETE
net group "g1" /DELETE

is causing the problem with winbind? So, I changed my script to:
------------------------------
while true; do
net group "g1" /ADD /COMMENT:"G 1"
net user u1 u1 /ADD /FULLNAME:"U 1" /COMMENT:"Test 1"
net group "g1" u1 /ADD

net user u2 u2 /ADD /FULLNAME:"U 2" /COMMENT:"Test 2"
net group "g1" u2 /ADD

sleep 9;

net group "g1" u2 /DELETE
net group "g1" u1 /DELETE

net user u1 /DELETE
net user u2 /DELETE
net group "g1" /DELETE

sleep 9;
done
------------------------------
Now, I get more or less the same backtrace:
---------------
Program received signal SIGSEGV, Segmentation fault.
0x0806bf4b in centry_put_string () at eval.c:41
41      eval.c: No such file or directory.
        in eval.c
(gdb)
(gdb) bt
#0  0x0806bf4b in centry_put_string () at eval.c:41
#1  0x0806cd42 in lookup_groupmem () at eval.c:41
#2  0x08068695 in fill_grent_mem () at eval.c:41
#3  0x080699ac in winbindd_getgrent () at eval.c:41
#4  0x080663dd in process_request () at eval.c:41
#5  0x0806664e in process_packet () at eval.c:41
#6  0x08066c36 in process_loop () at eval.c:41
#7  0x08067151 in main () at eval.c:41
#8  0x40167507 in __libc_start_main (main=0x8066d10 <main>, argc=4,
    ubp_av=0xbffff854, init=0x8064e00 <_init>, fini=0x80fb980 <_fini>,
    rtld_fini=0x4000dc14 <_dl_fini>, stack_end=0xbffff84c)
    at ../sysdeps/generic/libc-start.c:129
----------------------------------

So, any ideas on this? The stack trace is fromthe alpha 17 version of code.

Mike Papper

I dont think this is relavent but, heres some IP debug messages (I do not set 
a WINS value in my sdmb.conf):
---------------------
IPC$ connections done anonymously
Connecting to host=MIKEPDC share=IPC$
resolve_lmhosts: Attempting lmhosts lookup for name MIKEPDC<0x20>
resolve_wins: Attempting wins lookup for name MIKEPDC<0x20>
resolve_wins: WINS server resolution selected and no WINS servers listed.
resolve_hosts: Attempting host lookup for name MIKEPDC<0x20>
name_resolve_bcast: Attempting broadcast lookup for name MIKEPDC<0x20>
bind succeeded on port 0
Got a positive name query response from 207.214.146.249 ( 207.214.146.249 )
Connecting to 207.214.146.249 at port 445
error connecting to 207.214.146.249:445 (Connection refused)
Connecting to 207.214.146.249 at port 139
--------------------------------




More information about the samba-technical mailing list