nmblookup problems [Includes patch for Samba bug]

David Collier-Brown davecb at canada.sun.com
Wed May 10 17:49:35 GMT 2000


The patch is at the very end...

Ron Alexander wrote:
> > socket 7 bound to addr:port 0.0.0.0:0
> > Socket 7 opened for port 0.
> > querying * on 255.255.255.255
> > FD 7 is set before select
> > Num FDS ready=1, errno=0 ()
> > FD 7 is set after select
> > Got a positive name query response from 134.111.57.12 ( 134.111.220.160 )
> > FD 7 is set before select
> > Num FDS ready=0, errno=1081 (Timeout period has expired.)
> > FD 7 is set after select
> >
> > At this point it waits forever!
 
and then:
> Here is what I see in my debugger.
> 
> 1: # 10:  read_udp_socket (line 179 in module util_sock)
> 1: #  9:  read_packet (line 693 in module nmblib)
> 1: #  8:  receive_packet (line 947 in module nmblib)
> 1: #  6:  name_query (line 277 in module namequery)
> 1: #  5:  query_one (line 100 in module nmblookup)
> 1: #  3:  main (line 271 in module nmblookup)
> 
> It seems simple, the select has indicated that a socket is ready for reading
> and when we go to read it we hang.


	What I claim is supposed to happen is:

Time   Command and Args             Returns
=====  ========================     =======
0.0334 so_socket(2, 1, 0, "", 1)  = 4	-- socket calls this on solaris
0.0340 bind(4, 0xFFBEEA88, 16, 3) = 0   -- bind fd 4
0.0375 sendto(4, "07040110\001\0\0\0\0\0\0".., 50, 0, 0xFFBEE638, 16)
= 50
					-- send the packet out
0.0545 poll(0xFFBEE918, 1, 90) = 1	-- select calls this on Solaris
0.0548 recvfrom(4, "070485\0\0\0\001\0\0\0\0".., 576, 0, 0xFFBEE704,
		 0xFFBEE714) = 62	-- receive first response					
0.0553 poll(0xFFBEE918, 1, 90) = 1	-- select again
0.0555 recvfrom(4, "070485\0\0\0\001\0\0\0\0".., 576, 0, 0xFFBEE704,
					-- and so on, getting 62 bytes
					-- each time, for 9 times on
					-- my network ...
0.1491 poll(0xFFBEE918, 1, 90) = 0	-- time out
0.2391 poll(0xFFBEE918, 1, 90) = 0	-- again
0.3292 poll(0xFFBEE918, 1, 90) = 0	-- and again
					-- then start printing the messages
doing parameter guest account = guest
doing parameter map to guest = bad user
pm_process() returned Yes
added interface ip=129.155.8.39 bcast=129.155.8.255
nmask=255.255.255.0
added interface ip=127.0.0.1 bcast=127.255.255.255 nmask=255.0.0.0
querying * on 255.255.255.255
Got a positive name query response from 129.155.8.113 ( 129.155.8.113
)
Got a positive name query response from 129.155.8.24 ( 129.155.8.24 )
Got a positive name query response from 129.155.8.213 ( 129.155.8.213
)
Got a positive name query response from 129.155.8.55 ( 129.155.8.55 )
Got a positive name query response from 129.155.8.38 ( 129.155.8.38 )
Got a positive name query response from 129.155.8.1 ( 129.155.8.1 )
Got a positive name query response from 129.155.8.34 ( 129.155.8.34 )
Got a positive name query response from 129.155.8.71 ( 129.155.8.71 )
Got a positive name query response from 129.155.8.79 ( 129.155.8.79 )
129.155.8.113 *<00>
129.155.8.24 *<00>
129.155.8.213 *<00>
129.155.8.55 *<00>
129.155.8.38 *<00>
129.155.8.1 *<00>
129.155.8.34 *<00>
129.155.8.71 *<00>
129.155.8.79 *<00>
0.3300 _exit(0)					-- and quit

	I suspect either a problem getting the data, or in the
	select. The critical code is in libsmb/nmblib.c, at 935
---
struct packet_struct *receive_packet(int fd,enum packet_type type,int
t)
{
  fd_set fds;
  struct timeval timeout;

  FD_ZERO(&fds);
  FD_SET(fd,&fds);
  timeout.tv_sec = t/1000;
  timeout.tv_usec = 1000*(t%1000);

  sys_select(fd+1,&fds,&timeout);

  if (FD_ISSET(fd,&fds))
    return(read_packet(fd,type));

  return(NULL);
}
---
	It's not a failed timeout, as the bit corresponding to fd
	is being returned set, and FD_ISSET succeeds...

	Looking at sys_select, and assuming you have select and not poll,
	we get:  
---
int sys_select(int maxfd, fd_set *fds, struct timeval *tval)
  struct timeval t2;
  int selrtn;

  do {
    if (tval) memcpy((void *)&t2,(void *)tval,sizeof(t2));
    errno = 0;
    selrtn = select(maxfd,SELECT_CAST fds,NULL,NULL,tval?&t2:NULL);
  } while (selrtn<0 && errno == EINTR);

  return(selrtn);
}
---
	Which basically retries if we get EINTR, which is quite sane.
	HOWEVER, we should not call read_packet unless sys_select
	return true.

	This looks like a Samba (portability?) bug, because select is 
	defined to fail on any of EINTR, EBADF or EINVAL and we only
	handle EINTR. Check if it's -1, and if so return NULL. 
	Without that, we could be getting -1, and we just set 
	the bit in fds, so FD_ISSET will always succeed.

	I rather strongly recommend:

	if (sys_select(fd+1,&fds,&timeout) == -1) {
		/* errno should be EBADF or EINVAL. */
		DEBUG(0,("select returned -1, errno = %s (%d)\n",
			strerror(errno), errno));
		return NULL;
	}
	else if FD_ISSET(fd,&fds)) {
		/* Get the data. */
		return(read_packet(fd,type));
	}
	else {
		/* Nothing waiting. */
		return NULL;
	}

	For the VOS port, I'd print the return from sys_select, and 
	see if we're passing it something bogus. 

--dave 
=========== PATCH ===========
*** nmblib.c.old        Wed May 10 13:43:03 2000
--- nmblib.c    Wed May 10 13:38:09 2000
***************
--- 942,961 ----
    timeout.tv_sec = t/1000;
    timeout.tv_usec = 1000*(t%1000);
  
!   if (sys_select(fd+1,&fds,&timeout) == -1) {
!     /* errno should be EBADF or EINVAL. */
!     DEBUG(0,("select returned -1, errno = %s (%d)\n",
!       strerror(errno), errno));
!     return NULL;
!   }
!   else if (FD_ISSET(fd,&fds)) {
!     /* Get the data. */
      return(read_packet(fd,type));
!   }
!   else {
!     /* Nothing waiting. */
!     return NULL;
!   }
  }
============ END PATCH ===================
David Collier-Brown,  | Always do right. This will gratify some people
185 Ellerslie Ave.,   | and astonish the rest.        -- Mark Twain
Willowdale, Ontario   | //www.oreilly.com/catalog/samba/author.html
Work: (905) 415-2849 Home: (416) 223-8968 Email: davecb at canada.sun.com


More information about the samba-technical mailing list