Severe problem with Samba

Martin Rootes M.J.Rootes at shu.ac.uk
Thu Dec 13 10:13:02 GMT 2001


Dear All,

	we are experiencing severe problems with Samba 2.2.0 (with quota support) running on a 
dual processor (400MHz) Sun E450 running Solaris 2.7. This is used as a central file server for 
student diskspace, accessed by approx 1200 PCs running NT 4. Up until recently we 
experienced some, what we assume to be, loading issues with connections during the middle of 
the day being slow. However, recently we have been encountering severe problems. Everything 
seems fine until  midday, then what we start to see is the number of smbd processes going up 
whilst the number of connections (determined from smbstatus -b) dropping, students with 
connections starting getting slow responses and no new connections are being made, load on 
the system skyrockets. stopping samba and restarting seems to cure the problem, but the 
problem can re-occur. We are in a desperate panic at the moment as the students are all doing 
assignments and this is seriously affecting their work. We have tried various tweaks to Samba 
(deadtime, change notify timeout), the tcp stack and have tripled system memory, all to no avail. 
We also seem to have an issue with keepalives and tcp_nodelay, neither of which seem to work 
at all, we see the following messages in the log about keepalives:-

[2001/12/13 11:55:29, 0] lib/util_sock.c:set_socket_options(165)
  Failed to set socket option SO_KEEPALIVE (Error Invalid argument)
[2001/12/13 11:55:29, 0] lib/util_sock.c:set_socket_options(165)
  Failed to set socket option TCP_NODELAY (Error Invalid argument)

The following are a selection of messages appearing just before Samba was stopped:

[2001/12/13 11:39:51, 0] lib/util_sock.c:write_socket(566)
  write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe
[2001/12/13 11:39:51, 0] lib/util_sock.c:send_smb(753)
  Error writing 4 bytes to client. -1. Exiting
[2001/12/13 11:40:29, 0] lib/util_sock.c:get_socket_addr(1084)
  getpeername failed. Error was Transport endpoint is not connected
[2001/12/13 11:40:30, 0] lib/util_sock.c:get_socket_addr(1084)
  getpeername failed. Error was Transport endpoint is not connected
[2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket_data(542)
  write_socket_data: write failure. Error = Broken pipe
[2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket(566)
  write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe
[2001/12/13 11:40:30, 0] lib/util_sock.c:send_smb(753)
  Error writing 4 bytes to client. -1. Exiting
[2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket_data(542)
  write_socket_data: write failure. Error = Broken pipe
[2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket(566)
  write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe
[2001/12/13 11:40:30, 0] lib/util_sock.c:send_smb(753)
  Error writing 4 bytes to client. -1. Exiting
[2001/12/13 11:40:33, 0] lib/util_sock.c:read_socket_data(479)
  read_socket_data: recv failure for 4. Error = Connection reset by peer
[2001/12/13 11:40:49, 0] smbd/server.c:open_sockets(251)
  open_sockets: accept: Software caused connection abort
[2001/12/13 11:40:53, 0] lib/util_sock.c:read_socket_data(479)
  read_socket_data: recv failure for 4. Error = Connection reset by peer

We think we may have loading problems, however, if it is, it doesn't seem to be directly 
proportional to number of connections. In fact there will be a significant rise in the load at, and 
for 10 - 15 mins past, the hour (this is all day long not just midday), we assume that this is 
because logging in exacts a high load on the system. It's alos possible that the midday 
problems are caused by different patterns of working, as students will be logging in for short 
periods to check e-mail before going to get lunch etc. Another oddity we see are some samba 
connections left running from the day before (or sometimes longer), so we are wondering 
whether connections are not getting killed properly, thereby adding to the load.

So, please, any pointers as to what the problem is would be very helpful. At the moment we're 
struggling, I'm considering getting a less stressful job - something like a fork lift truck driver in an 
explosives factory - and people are starting to question whether we should replace the whole 
system with a Novell based one!

	Thanks in advance

	Martin Rootes
	Systems Support


------------------------------------------------------------------------------
Martin Rootes - Senior Systems Programmer/Analyst, Sheffield Hallam University
Email :         M.J.Rootes at shu.ac.uk                      Phone: 0114 225 3828
------------------------------------------------------------------------------




More information about the samba mailing list