100% CPU usage and no login

Geza Makay makayg at math.u-szeged.hu
Tue Dec 22 09:18:03 GMT 1998


At 08:01 AM 12/21/98 -0500, you wrote:
>> I do see 100% CPU usage, and I cannot login from a Windows NT 4.0
>> WorkStation. I did everything according to the documentation, and even
>> joining to the domain worked wonderfully. I did not create any "domain
>> group map" or "domain user map", if I do not want to manage users using NT,
>> then I do not need these (according to the documentation). If I try to
>> login with an incorrect user name or password, then I receive the message
>> that the user name or password is not correct within a second. But trying
>> to login with a correct username/password always stops the NT for 2-5
>> minutes, then I got the message that the machine account does not have the
>> correct password. Setting the log level to 99, I see the followings in the
>> log file (only an extract):
>> 
>> [1998/12/21 07:16:41, 4] rpc_server/srv_pipe.c:api_rpc_command(670)
>>   api_rpc_command: api_netlog_rpc op 0x4 - api_rpc_command: NET_REQCHAL
>> ..
>> [1998/12/21 07:16:42, 4] rpc_server/srv_pipe.c:api_rpc_command(670)
>>   api_rpc_command: api_netlog_rpc op 0xf - api_rpc_command: NET_AUTH2
>> ..
>> [1998/12/21 07:16:42, 4] rpc_server/srv_pipe.c:api_rpc_command(670)
>>   api_rpc_command: api_netlog_rpc op 0x2 - api_rpc_command: NET_SAMLOGON
>> ..
>> [1998/12/21 07:16:42, 3] rpc_server/srv_netlog.c:api_net_sam_logon(653)
>>   SAM Logon (Interactive). Domain:[BOLYAI].  User:[makay]
>> ..
>> [1998/12/21 07:16:42, 10] passdb/sampassdb.c:pwdb_sam_map_names(443)
>>   pwdb_sam_map_name: found unix user makay nt makay uid 202 rid 0x710
>> [1998/12/21 07:16:42, 10] groupdb/groupdb.c:iterate_getusergroupsnam(217)
>>   search for usergroups by name: makay
>> 
>> Although the RPC commands differ from those in the documentation (in
>> NTDOMAIN.txt it says they start with LSA_, and that I should also have an
>> LSA_NET_SRV_PWSET after LSA_AUTH2, which I do not have), this seems OK to
>> me sofar, everything happened within 1 second of the actual entering the
>> password at the login screen on the NT machine. Now the interesting part:
>> the following messages repeat until I kill the appropriate smbd process:
>> 
>> [1998/12/21 07:16:42, 10] lib/domain_namemap.c:lookupsmbgrpgid(1171)
>>   lookupsmbgrpgid: unix gid 0
>> [1998/12/21 07:16:42, 10] groupdb/groupunix.c:getgrpunixpwent(210)
>>   line: 'root::1001:'
>> [1998/12/21 07:16:42, 5] groupdb/groupdb.c:iterate_getusergroupsnam(239)
>>   group name root members: 0
>> [1998/12/21 07:16:42, 10] groupdb/groupunix.c:getgrpunixpwent(169)
>>   getgrpunixpwent: enum unix group entry root
>> 
>> Note that I do not get an LSA_SAM_LOGOFF (or NET_SAM_LOGOFF) anywhere in
>> the log file. Could someone, please, check what goes wrong?
>
>  This seems to be a known problem on several platforms. I had this
problem on
>IRIX 6.2 and switched to a 6.5 machine and it was gone. I believe it has also
>been reported on DEC UNIX. I  spent a few days trying to figure out what was
>going on then gave up. I believe Luke is working on a workaround. What is
>happening is that samba is trying to find all the groups on your machine but
>the system is always returning the first group and smbd gets stuck in an
>infinite (ish) loop.
>
>So not a good answer for sure, but change your OS.

Dear Greg, Luke, and others writing the code,

OK, I checked this out going really deep in the code, and I found the
following:

   groupdb/groupunix.c:getgrpunixpwent() calls lib/util.c:gidtoname()
      (through several steps) in a while cycle which goes by the
      return value of getgrent().
   lib/util.c:gidtoname() calls getgrgid() which resets the file
      pointer of the group file (at least under my OS).
   Therefore the while cycle in groupdb/groupunix.c:getgrpunixpwent()
      would go into an infinite loop, if it did not have a break in it.

Most likely the author(s) did not expect/know that getgrgid() resets the
file pointer of the group file (at least on some OS's, like mine), and that
causes that the groupdb/groupunix.c:getgrpunixpwent() function always check
the very same group over and over again.
Here is a short test code with an infinite cycle, which describes exactly
what happens in these files (I put this in the
groupdb/groupunix.c:getgrpunixpwent() just before the original while cycle,
and it made the program go into an infinite cycle):

...
	struct group *unix_grp;
...
	while ((unix_grp = getgrent()) != NULL)
	{
		DEBUG(1,("getgrent->name: %s\n",unix_grp->gr_name));
		getgrgid(unix_grp->gr_gid);
	}

        /* cycle through unix groups */
...

Could someone from the authors of these codes confirm this (I have SCO
OpenServer Enterprise System 5.0.2)? And is it possible to do something
about it?

Thank in advance.

With best regards,

Geza Makay


*************************************************************************
*           Name: Geza Makay (Mr., Dr., Prof.)                          *
*      Institute: Jozsef Attila University of Szeged                    *
*           Mail: Bolyai Institute, Aradi vertanuk tere 1.              *
*                 H-6720, Szeged, Hungary                               *
*            Tel: (62) 454-091 (Hungary's code: 36)                     *
*    Fax/Message: (62) 426-246 (Hungary's code: 36)                     *
*         E-mail: makayg at math.u-szeged.hu                               *
* World Wide Web: http://www.math.u-szeged.hu/                          *
*************************************************************************
* "To err is human, but to really mess things up you need a computer."  *
*************************************************************************



More information about the samba-ntdom mailing list