FreeBSD + samba 2.2.2 problems; semi-solution

Mike Silbersack silby at silby.com
Wed Jan 16 21:34:02 GMT 2002


As many of you may have noticed, there are reports on the various samba
and freebsd lists report serious problems with oplocks.  Specifically,
oplock error messages begin to appear in host-specific logfiles, similar
to the following:

[2002/01/02 17:19:09, 0] smbd/oplock.c:request_oplock_break(981)
  request_oplock_break: no response received to oplock break request to
pid 51973 on port 4633 for dev = 27400, inode = 957708, file_id = 18

One then finds that the smbd process (pid 51973 in this case) must be
manually killed for the problem to go away.

I've spent many hours poking at this problem, and I've come to the
conclusion that oplocks are not (directly) to blame in this situation.

We seem to have two problems occuring:

1.  Samba blocks indefinitely in some read() calls when a client "goes
quiet."

2.  There is some win98se <-> samba 2.2.2 interaction causing clients to
"go quiet."

1.  Samba issues some blocking calls with no timeout, causing the smbd
process to indefinitely hang if a client suddenly "goes quiet."  The
guilty path in the case I can recreate is this:

#0  0x281d3863 in read () from /usr/lib/libc.so.5
#1  0x811118b in read_socket_data (fd=12, buffer=0x8227005 "<FF>SMB\013", N=57904) at lib/util_sock.c:465
#2  0x8111699 in receive_smb (fd=12, buffer=0x8227001 "", timeout=0) at lib/util_sock.c:669
#3  0x807ef20 in receive_message_or_smb (buffer=0x8227001 "", buffer_len=65600, timeout=60000) at smbd/process.c:246
#4  0x80800c2 in smbd_process () at smbd/process.c:1252
#5  0x804c34d in main (argc=4, argv=0xbfbffbe4) at smbd/server.c:827
#6  0x804ae19 in _start ()

I've worked around this with the patch to util_sock.c which is attached;
it replaces the above call from read_socket_data to
read_socket_with_timeout, specifying a 10 second timeout.  As a result,
smbd processes in this state will detect that the client suddenly went
quiet, and exit after 10 seconds, dropping all held oplocks.  This is only
a temporary workaround, albeit a very effective one.  A better fix would
be to have real timeouts passed into receive_smb, then to have these
timeouts propegated down.  Additionally, select()ion on the oplock socket
could be added to these inner calls.  (That change might take a large
rearchitecturing, however.)

In short, I think that it would be very wise to provide _some_ timeout
whenever reading, just so that sysadmins do not have to go in and manually
kill processes in cases like this.

2.  There is some win98se <-> samba 2.2.2 interaction causing clients to
"go quiet."

That this problem is occuring I can attest to; when copying the game
"Serious Sam" to a samba share on my FreeBSD box, I can cause this
condition to occur > 90% of the time.  This is not a simple problem with
high load; I can copy a directory many times larger full of mpg / mov
files without problem.  Hence, I suspect that there is some data dependant
situation occuring.

I've tried comparing network parameters to linux boxes and changing my
settings to match with mixed results.  By changing send / receive socket
sizes, I am able to change the file in which the problem will occur, but
it still occurs.  (Note that at the time of the hang, both send and
receive socket buffers are empty; this is not a problem of data simply not
being read.)

Now, to make matters stranger... once the copy hangs, I am able to go back
and copy the offending file - _if I copy it individually_.  My theory is
that some command is being piggybacked on the back of some data and being
lost inside a parser somewhere.  Is it possible that read calls are
returning too much data?

Unfortunately, I really don't know anything about how the smb protocol or
samba works.  I'd be able to produce any debug output necessary, and I'd
be glad to provide such data to anyone with samba experience who thinks
that they might be able to take a crack at this.

Whether or not issue #2 is able to be addressed, I'd really appreciate it
if some samba developers take a look at addressing issue #1; it magnifies
other problems greatly.

Thanks,

Mike "Silby" Silbersack
-------------- next part --------------
Index: util_sock.c
===================================================================
RCS file: /cvsroot/samba/source/lib/util_sock.c,v
retrieving revision 1.16.4.19
diff -u -r1.16.4.19 util_sock.c
--- util_sock.c	20 Dec 2001 17:37:11 -0000	1.16.4.19
+++ util_sock.c	16 Jan 2002 04:51:24 -0000
@@ -666,7 +666,7 @@
 	}
 
 	if(len > 0) {
-		ret = read_socket_data(fd,buffer+4,len);
+		ret = read_socket_with_timeout(fd,buffer+4,len,len,10000);
 		if (ret != len) {
 			smb_read_error = READ_ERROR;
 			return False;


More information about the samba-technical mailing list