Severe problem with Samba
Martin Rootes
M.J.Rootes at shu.ac.uk
Thu Dec 13 10:13:02 GMT 2001
Dear All,
we are experiencing severe problems with Samba 2.2.0 (with quota support) running on a
dual processor (400MHz) Sun E450 running Solaris 2.7. This is used as a central file server for
student diskspace, accessed by approx 1200 PCs running NT 4. Up until recently we
experienced some, what we assume to be, loading issues with connections during the middle of
the day being slow. However, recently we have been encountering severe problems. Everything
seems fine until midday, then what we start to see is the number of smbd processes going up
whilst the number of connections (determined from smbstatus -b) dropping, students with
connections starting getting slow responses and no new connections are being made, load on
the system skyrockets. stopping samba and restarting seems to cure the problem, but the
problem can re-occur. We are in a desperate panic at the moment as the students are all doing
assignments and this is seriously affecting their work. We have tried various tweaks to Samba
(deadtime, change notify timeout), the tcp stack and have tripled system memory, all to no avail.
We also seem to have an issue with keepalives and tcp_nodelay, neither of which seem to work
at all, we see the following messages in the log about keepalives:-
[2001/12/13 11:55:29, 0] lib/util_sock.c:set_socket_options(165)
Failed to set socket option SO_KEEPALIVE (Error Invalid argument)
[2001/12/13 11:55:29, 0] lib/util_sock.c:set_socket_options(165)
Failed to set socket option TCP_NODELAY (Error Invalid argument)
The following are a selection of messages appearing just before Samba was stopped:
[2001/12/13 11:39:51, 0] lib/util_sock.c:write_socket(566)
write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe
[2001/12/13 11:39:51, 0] lib/util_sock.c:send_smb(753)
Error writing 4 bytes to client. -1. Exiting
[2001/12/13 11:40:29, 0] lib/util_sock.c:get_socket_addr(1084)
getpeername failed. Error was Transport endpoint is not connected
[2001/12/13 11:40:30, 0] lib/util_sock.c:get_socket_addr(1084)
getpeername failed. Error was Transport endpoint is not connected
[2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket_data(542)
write_socket_data: write failure. Error = Broken pipe
[2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket(566)
write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe
[2001/12/13 11:40:30, 0] lib/util_sock.c:send_smb(753)
Error writing 4 bytes to client. -1. Exiting
[2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket_data(542)
write_socket_data: write failure. Error = Broken pipe
[2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket(566)
write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe
[2001/12/13 11:40:30, 0] lib/util_sock.c:send_smb(753)
Error writing 4 bytes to client. -1. Exiting
[2001/12/13 11:40:33, 0] lib/util_sock.c:read_socket_data(479)
read_socket_data: recv failure for 4. Error = Connection reset by peer
[2001/12/13 11:40:49, 0] smbd/server.c:open_sockets(251)
open_sockets: accept: Software caused connection abort
[2001/12/13 11:40:53, 0] lib/util_sock.c:read_socket_data(479)
read_socket_data: recv failure for 4. Error = Connection reset by peer
We think we may have loading problems, however, if it is, it doesn't seem to be directly
proportional to number of connections. In fact there will be a significant rise in the load at, and
for 10 - 15 mins past, the hour (this is all day long not just midday), we assume that this is
because logging in exacts a high load on the system. It's alos possible that the midday
problems are caused by different patterns of working, as students will be logging in for short
periods to check e-mail before going to get lunch etc. Another oddity we see are some samba
connections left running from the day before (or sometimes longer), so we are wondering
whether connections are not getting killed properly, thereby adding to the load.
So, please, any pointers as to what the problem is would be very helpful. At the moment we're
struggling, I'm considering getting a less stressful job - something like a fork lift truck driver in an
explosives factory - and people are starting to question whether we should replace the whole
system with a Novell based one!
Thanks in advance
Martin Rootes
Systems Support
------------------------------------------------------------------------------
Martin Rootes - Senior Systems Programmer/Analyst, Sheffield Hallam University
Email : M.J.Rootes at shu.ac.uk Phone: 0114 225 3828
------------------------------------------------------------------------------
More information about the samba
mailing list