Solaris generating ECONNRESET causing Samba failures
Scott Moomaw
scott at bridgewater.edu
Wed Aug 29 22:07:16 GMT 2001
We have a group of servers all running Solaris (2.6 and 2.8) with samba.
To circumvent locking problems, I've turned off oplocks on some of these
that are running 2.2.x. Now, I'm seeing client problems that I think I've
traced back to the source. In the log files, I'm seeing lines that
reference "read_sock_data: recv failure for 4. Error = Connection reset
by peer" on an ongoing basis. When this happens, things go haywire. I've
included a log snippet for viewing.
[2001/08/29 17:13:31, 1, pid=17083] smbd/service.c:make_connection(610)
trainer (147.138.20.16) connect to service scollins as user scollins
(uid=1038, gid=14) (pid 17083)
[2001/08/29 17:26:45, 0, pid=17083] lib/util_sock.c:read_socket_data(478)
read_socket_data: recv failure for 4. Error = Connection reset by peer
[2001/08/29 17:26:45, 1, pid=17083] smbd/service.c:close_cnum(650)
trainer (147.138.20.16) closed connection to service scollins
[2001/08/29 17:26:45, 0, pid=17083] lib/util_sock.c:get_socket_addr(1031)
getpeername failed. Error was Transport endpoint is not connected
[2001/08/29 17:26:45, 1, pid=17083] lib/util_sock.c:get_socket_name(996)
Gethostbyaddr failed for 0.0.0.0
The strange thing is that I'm not finding a cause for the connection
resets. A packet trace, using Solaris's snoop command, doesn't reveal any
normal RST condition which in turn would cause the connection to reset.
Looking at a truss of one of these processes, I find data like what
follows:
7776: 173.2877 poll(0x08047884, 3, 60000) = 1
7776: 173.2880 read(5, "\0\0\0 3", 4) = 4
7776: 173.2882 read(5, "FF S M B1A\0\0\0\0\0\0\0".., 51) = 51
7776: 173.2883 gettimeofday(0x081DF1C4) = 0
7776: 173.2884 fstat64(25, 0x08047908) = 0
7776: 173.2885 llseek(25, 2809801, SEEK_SET) = 2809801
7776: 173.2886 read(25, 0x082222D5, -3070) Err#22 EINVAL
7776: 173.2887 write(5, "\001FF", 3) = 3
7776: poll(0x08047884, 3, 60000) (sleeping...)
7776: 219.6291 poll(0x08047884, 3, 60000) = 1
7776: 219.6298 read(5, 0x08211E89, 4) Err#131ECONNRESET
7776: 219.6299 time() = 999108532
The ECONNRESET from the read call directly corresponds to the connection
reset in the logs. Does anyone have a suggestion as to what could be
causing the ECONNRESET? I can't find any evidence from snoop, interface
statistics on the switch and host, or a network sniffer that accounts for
the resets. They're appearing on the group of servers which vary in
hardware so that's not a commonality. Does Solaris have a bug that can
generate spurios ECONNRESET messages? Can anyone think of a possible
workaround if this is the case?
Scott
------------------------------------------------------------------------
Scott Moomaw, Network Administrator Scott at Bridgewater.edu
Bridgewater College, IT Center
Bridgewater, VA 22812
Phone (540) 828 - 8000 x5437 FAX: (540) 828 - 5493
More information about the samba-technical
mailing list