Severe problem with Samba

Lonny Schwartz lschwartz at micromuse.com
Thu Dec 13 10:38:02 GMT 2001


You may want to check out this article from Sysadmin Mag about Solris
Performance Tuning, seems like some of this may apply to your situation.

http://www.samag.com/documents/s=1323/sam0110e/0110e.htm

also this site which is referenced in the article

http://www.sean.de/Solaris/tune.html

Cheers,

Lonny

-----Original Message-----
From: samba-admin at lists.samba.org [mailto:samba-admin at lists.samba.org]On
Behalf Of Martin Rootes
Sent: Thursday, December 13, 2001 10:08 AM
To: Samba
Subject: Severe problem with Samba


Dear All,

	we are experiencing severe problems with Samba 2.2.0 (with quota support)
running on a
dual processor (400MHz) Sun E450 running Solaris 2.7. This is used as a
central file server for
student diskspace, accessed by approx 1200 PCs running NT 4. Up until
recently we
experienced some, what we assume to be, loading issues with connections
during the middle of
the day being slow. However, recently we have been encountering severe
problems. Everything
seems fine until  midday, then what we start to see is the number of smbd
processes going up
whilst the number of connections (determined from smbstatus -b) dropping,
students with
connections starting getting slow responses and no new connections are being
made, load on
the system skyrockets. stopping samba and restarting seems to cure the
problem, but the
problem can re-occur. We are in a desperate panic at the moment as the
students are all doing
assignments and this is seriously affecting their work. We have tried
various tweaks to Samba
(deadtime, change notify timeout), the tcp stack and have tripled system
memory, all to no avail.
We also seem to have an issue with keepalives and tcp_nodelay, neither of
which seem to work
at all, we see the following messages in the log about keepalives:-

[2001/12/13 11:55:29, 0] lib/util_sock.c:set_socket_options(165)
  Failed to set socket option SO_KEEPALIVE (Error Invalid argument)
[2001/12/13 11:55:29, 0] lib/util_sock.c:set_socket_options(165)
  Failed to set socket option TCP_NODELAY (Error Invalid argument)

The following are a selection of messages appearing just before Samba was
stopped:

[2001/12/13 11:39:51, 0] lib/util_sock.c:write_socket(566)
  write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe
[2001/12/13 11:39:51, 0] lib/util_sock.c:send_smb(753)
  Error writing 4 bytes to client. -1. Exiting
[2001/12/13 11:40:29, 0] lib/util_sock.c:get_socket_addr(1084)
  getpeername failed. Error was Transport endpoint is not connected
[2001/12/13 11:40:30, 0] lib/util_sock.c:get_socket_addr(1084)
  getpeername failed. Error was Transport endpoint is not connected
[2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket_data(542)
  write_socket_data: write failure. Error = Broken pipe
[2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket(566)
  write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe
[2001/12/13 11:40:30, 0] lib/util_sock.c:send_smb(753)
  Error writing 4 bytes to client. -1. Exiting
[2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket_data(542)
  write_socket_data: write failure. Error = Broken pipe
[2001/12/13 11:40:30, 0] lib/util_sock.c:write_socket(566)
  write_socket: Error writing 4 bytes to socket 12: ERRNO = Broken pipe
[2001/12/13 11:40:30, 0] lib/util_sock.c:send_smb(753)
  Error writing 4 bytes to client. -1. Exiting
[2001/12/13 11:40:33, 0] lib/util_sock.c:read_socket_data(479)
  read_socket_data: recv failure for 4. Error = Connection reset by peer
[2001/12/13 11:40:49, 0] smbd/server.c:open_sockets(251)
  open_sockets: accept: Software caused connection abort
[2001/12/13 11:40:53, 0] lib/util_sock.c:read_socket_data(479)
  read_socket_data: recv failure for 4. Error = Connection reset by peer

We think we may have loading problems, however, if it is, it doesn't seem to
be directly
proportional to number of connections. In fact there will be a significant
rise in the load at, and
for 10 - 15 mins past, the hour (this is all day long not just midday), we
assume that this is
because logging in exacts a high load on the system. It's alos possible that
the midday
problems are caused by different patterns of working, as students will be
logging in for short
periods to check e-mail before going to get lunch etc. Another oddity we see
are some samba
connections left running from the day before (or sometimes longer), so we
are wondering
whether connections are not getting killed properly, thereby adding to the
load.

So, please, any pointers as to what the problem is would be very helpful. At
the moment we're
struggling, I'm considering getting a less stressful job - something like a
fork lift truck driver in an
explosives factory - and people are starting to question whether we should
replace the whole
system with a Novell based one!

	Thanks in advance

	Martin Rootes
	Systems Support


----------------------------------------------------------------------------
--
Martin Rootes - Senior Systems Programmer/Analyst, Sheffield Hallam
University
Email :         M.J.Rootes at shu.ac.uk                      Phone: 0114 225
3828
----------------------------------------------------------------------------
--

--
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba





More information about the samba mailing list