Domain logon problems with 10.Mar.99 CVS source

Thu Mar 11 15:16:00 GMT 1999

> 
> I hope the change I asked Luke to make is not causing your problem. I
> believe it is a memory corruption problem, in order to find it do the
> following. This will depend on the kind of system you are using but try
> this, assuming you are the only one trying to log onto the PDC. There should
> be only 1 or 2 smbd processes running, if there is 1 then hitting
> Ctrl-Alt-Del on your NT machine should start another one. For the process
> that is the child of the other smbd run dbx -p <pid> and try to log in. Hit
> return in dbx as it will stop when it it receives the SIGSEV. At this point
> do a where and post the results here.

Ok, did this:

Program received signal SIGSEGV, Segmentation fault.
0xc4bb4 in Get_Pwnam ()
(gdb) where
#0  0xc4bb4 in Get_Pwnam ()
#1  0xc29f0 in nametouid ()
#2  0xad7f0 in lookupsmbpwnam ()
#3  0xb1dc0 in get_unixgroup_members ()
#4  0xb2124 in getgrpunixpwent ()
#5  0xafddc in getgroupent ()
#6  0xafad0 in iterate_getusergroupsnam ()
#7  0xaffcc in getusergroupsntnam ()
#8  0x74878 in api_net_sam_logon ()
#9  0x7e714 in api_rpc_command ()
#10 0x7e810 in api_rpcTNP ()
#11 0x74c10 in api_netlog_rpc ()
#12 0x7e40c in api_pipe_request ()
#13 0x7e510 in rpc_command ()
#14 0x3c180 in api_fd_reply ()
#15 0x3c8f0 in named_pipe ()
#16 0x3d034 in reply_trans ()
#17 0x5aaf4 in switch_message ()
#18 0x5ab80 in construct_reply ()
#19 0x5ad3c in process_smb ()
#20 0x5b6c0 in smbd_process ()
#21 0x2c9bc in main ()

smbd is running on a Sparc 5 and compiled with GCC 2.8.1, the disassembly 
looks like this:

(gdb) disassemble 0xc4b9c
Dump of assembler code for function Get_Pwnam:
0xc4b9c <Get_Pwnam>:    save  %sp, -240, %sp
0xc4ba0 <Get_Pwnam+4>:  call  0x6018c <lp_usernamelevel>
0xc4ba4 <Get_Pwnam+8>:  nop 
0xc4ba8 <Get_Pwnam+12>: cmp  %i0, 0
0xc4bac <Get_Pwnam+16>: be  0xc4cdc <Get_Pwnam+320>
0xc4bb0 <Get_Pwnam+20>: mov  %o0, %l1
0xc4bb4 <Get_Pwnam+24>: ldsb  [ %i0 ], %o0
0xc4bb8 <Get_Pwnam+28>: cmp  %o0, 0
0xc4bbc <Get_Pwnam+32>: be  0xc4cdc <Get_Pwnam+320>
0xc4bc0 <Get_Pwnam+36>: add  %fp, -144, %l0
0xc4bc4 <Get_Pwnam+40>: mov  %l0, %o0
0xc4bc8 <Get_Pwnam+44>: mov  %i0, %o1
0xc4bcc <Get_Pwnam+48>: call  0xc77d8 <StrnCpy>
...
(gdb) disassemble 0xc29f0
Dump of assembler code for function nametouid:
0xc29e0 <nametouid>:    save  %sp, -112, %sp
0xc29e4 <nametouid+4>:  mov  %i0, %o0
0xc29e8 <nametouid+8>:  call  0xc4b9c <Get_Pwnam>
0xc29ec <nametouid+12>: clr  %o1
0xc29f0 <nametouid+16>: cmp  %o0, 0
...

Some registers:

o0             0x0      0
l0             0x0      0
l1             0x0      0
i0             0x6270727a       1651536506

Looks like i0 is incorrect and addressing memory at [%i0] causes the SIGSEGV:

(gdb) x 0x6270727a
0x6270727a <_end+1650381810>:   Cannot access memory at address 0x6270727a.

Seems like a memory corruption to me too:

0> perl -e 'print "\x62\x70\x72\x7a\n";'
bprz
0> ypcat group | fgrep bprz
cocoon:*:10014:rys,wunderli,norrie,bprzydat,richwood,roehm

Remember my logfile sniplet?

>   4156    lookupsmbgrpnam: unix user group cocoon
>   4157  [1999/03/10 18:14:49, 10] lib/domain_namemap.c:lookupsmbgrpgid(1270)
>   4158    lookupsmbgrpgid: unix gid 10014
>   4159  [1999/03/10 18:14:49, 10] lib/domain_namemap.c:lookupsmbpwnam(886)
>   4160  [1999/03/10 18:14:49, 0] lib/fault.c:fault_report(40)

After some analysis I found out he wrong 'unix_name' originates from here:

...
BOOL get_unixgroup_members(struct group *grp,
                                int *num_mem, DOMAIN_GRP_MEMBER **members)
{
 ...
        for (i = 0; (unix_name = grp->gr_mem[i]) != NULL; i++)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^

I'll try to locate the place where the memory gets corrupted but this will take
some time as I'm not familiar with the code yet.

Thanks so far!

- Stefan

--
Stefan Walter - SysAdmin at D-INFK (StabSoft), ETH Zurich, Switzerland