FreeBSD + samba 2.2.2 problems; semi-solution
Jeremy Allison
jra at samba.org
Wed Jan 16 23:22:02 GMT 2002
On Wed, Jan 16, 2002 at 11:32:56PM +0000, Mike Silbersack wrote:
>
> As many of you may have noticed, there are reports on the various samba
> and freebsd lists report serious problems with oplocks. Specifically,
> oplock error messages begin to appear in host-specific logfiles, similar
> to the following:
>
> [2002/01/02 17:19:09, 0] smbd/oplock.c:request_oplock_break(981)
> request_oplock_break: no response received to oplock break request to
> pid 51973 on port 4633 for dev = 27400, inode = 957708, file_id = 18
>
> One then finds that the smbd process (pid 51973 in this case) must be
> manually killed for the problem to go away.
>
> I've spent many hours poking at this problem, and I've come to the
> conclusion that oplocks are not (directly) to blame in this situation.
>
> We seem to have two problems occuring:
>
> 1. Samba blocks indefinitely in some read() calls when a client "goes
> quiet."
>
> 2. There is some win98se <-> samba 2.2.2 interaction causing clients to
> "go quiet."
>
> 1. Samba issues some blocking calls with no timeout, causing the smbd
> process to indefinitely hang if a client suddenly "goes quiet." The
> guilty path in the case I can recreate is this:
>
> #0 0x281d3863 in read () from /usr/lib/libc.so.5
> #1 0x811118b in read_socket_data (fd=12, buffer=0x8227005 "<FF>SMB\013", N=57904) at lib/util_sock.c:465
> #2 0x8111699 in receive_smb (fd=12, buffer=0x8227001 "", timeout=0) at lib/util_sock.c:669
> #3 0x807ef20 in receive_message_or_smb (buffer=0x8227001 "", buffer_len=65600, timeout=60000) at smbd/process.c:246
> #4 0x80800c2 in smbd_process () at smbd/process.c:1252
> #5 0x804c34d in main (argc=4, argv=0xbfbffbe4) at smbd/server.c:827
> #6 0x804ae19 in _start ()
>
> I've worked around this with the patch to util_sock.c which is attached;
> it replaces the above call from read_socket_data to
> read_socket_with_timeout, specifying a 10 second timeout. As a result,
> smbd processes in this state will detect that the client suddenly went
> quiet, and exit after 10 seconds, dropping all held oplocks. This is only
> a temporary workaround, albeit a very effective one. A better fix would
> be to have real timeouts passed into receive_smb, then to have these
> timeouts propegated down. Additionally, select()ion on the oplock socket
> could be added to these inner calls. (That change might take a large
> rearchitecturing, however.)
>
> In short, I think that it would be very wise to provide _some_ timeout
> whenever reading, just so that sysadmins do not have to go in and manually
> kill processes in cases like this.
>
> 2. There is some win98se <-> samba 2.2.2 interaction causing clients to
> "go quiet."
>
> That this problem is occuring I can attest to; when copying the game
> "Serious Sam" to a samba share on my FreeBSD box, I can cause this
> condition to occur > 90% of the time. This is not a simple problem with
> high load; I can copy a directory many times larger full of mpg / mov
> files without problem. Hence, I suspect that there is some data dependant
> situation occuring.
>
> I've tried comparing network parameters to linux boxes and changing my
> settings to match with mixed results. By changing send / receive socket
> sizes, I am able to change the file in which the problem will occur, but
> it still occurs. (Note that at the time of the hang, both send and
> receive socket buffers are empty; this is not a problem of data simply not
> being read.)
Can you reproduce this problem on any other system
than FreeBSD ? I'n particular, can you get this to occur
on a Linux box ?
I'm wondering if there's a TCP problem between Win98 and
FreeBSD when transporting SMB (which would be somewhat ironic
as they took their TCP stack from your source code in the first
place :-).
Jeremy.
More information about the samba-technical
mailing list