[Samba] getpeername/server crash problem
Mathieu Legare
Mathieu_Legare at UQTR.CA
Thu Apr 8 18:31:19 GMT 2004
Hi!
Before anything, here is my hardware/software informations :
Hardware :
-IBM x345, 1 CPU, 1G RAM, IBM ServeRAID controlle
-6 HD used with LVM, 2 volume group, 12 logical volumes all
running ext3
Software :
-RedHat Linux Enterprise AS (Academic) 3.0 update 1
-Kernel 2.4.21-4.0.2.EL
-samba-3.0.2-6.3E
-Running an apache 2 web server
-On normal use, there is only 10-15 computers "samba" connected to the server
with network drives
Recently (this week), I start having problem with a samba server. I kept
having like (many times each seconds) :
[...]
Apr 4 00:14:30 rohan smbd[3170]: write_socket_data: write failure. Error = Connection reset by peer
Apr 4 00:14:30 rohan smbd[3170]: [2004/04/04 00:14:30, 0] lib/util_sock.c:write_socket(413)
Apr 4 00:14:30 rohan smbd[3170]: write_socket: Error writing 4 bytes to socket 5: ERRNO = Connection reset by peer
Apr 4 00:14:30 rohan smbd[3170]: [2004/04/04 00:14:30, 0] lib/util_sock.c:send_smb(605)
Apr 4 00:14:30 rohan smbd[3170]: Error writing 4 bytes to client. -1. (Connection reset by peer)
Apr 4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:get_peer_addr(952)
Apr 4 00:46:30 rohan smbd[4201]: getpeername failed. Error was Transport endpoint is not connected
Apr 4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:get_peer_addr(952)
Apr 4 00:46:30 rohan smbd[4201]: getpeername failed. Error was Transport endpoint is not connected
Apr 4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:write_socket_data(388)
Apr 4 00:46:30 rohan smbd[4201]: write_socket_data: write failure. Error = Connection reset by peer
Apr 4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:write_socket(413)
Apr 4 00:46:30 rohan smbd[4201]: write_socket: Error writing 4 bytes to socket 16: ERRNO = Connection reset by peer
Apr 4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:send_smb(605)
Apr 4 00:46:30 rohan smbd[4201]: Error writing 4 bytes to client. -1. (Connection reset by peer)
[...]
At some point, the samba server is going crazy and I have seen up to 11000 "smbd -D" process with a
wooping load average of 600 !! Needless to say the server was dying and almost frozen and I had to reboot.
I start monitoring more carefully the server and when the # of process was to high (normally there is
8-12 smbd process), I have to "killall -9 smbd" and start over. 3 seconds after I was start, I often
saw 200 process and had to kill it again.
I added some option in smb.conf :
deadtime = 60
debug uid = yes
debug pid = yes
oplocks = no
log level = 1
max connections = 50
max smbd processes = 50
hostname lookups = no
socket options = TCP_NODELAY SO_KEEPALIVE
With no success! I was surprise to see that "max smbd processes = 50" did not prevent samba to grow up to an
amazing number of process very quickly (+1000)
I started to log with iptables what was happening on the IP layer (I logged incoming packets
matching udp/tcp on port 137/138/139/445). Very quickly, another storm occured and the server
was receiveing A LOT of packets :
[...]
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=48 TOS=0x00 PREC=0x00 TTL=127 ID=54407 DF PROTO=TCP SPT=3863 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54408 DF PROTO=TCP SPT=3863 DPT=445 WINDOW=65535 RES=0x00 ACK URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54410 DF PROTO=TCP SPT=3863 DPT=445 WINDOW=65535 RES=0x00 ACK FIN URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=48 TOS=0x00 PREC=0x00 TTL=127 ID=54411 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54412 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65535 RES=0x00 ACK URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=48 TOS=0x00 PREC=0x00 TTL=127 ID=54413 DF PROTO=TCP SPT=3866 DPT=139 WINDOW=65535 RES=0x00 SYN URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=177 TOS=0x00 PREC=0x00 TTL=127 ID=54414 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65535 RES=0x00 ACK PSH URGP=0
Apr 8 11:19:17 rohan smbd[5095]: [2004/04/08 11:19:17, 0, pid=5095, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952)
Apr 8 11:19:17 rohan smbd[5095]: getpeername failed. Error was Transport endpoint is not connected
Apr 8 11:19:17 rohan smbd[5095]: [2004/04/08 11:19:17, 0, pid=5095, effective(0, 0), real(0, 0)] lib/util_sock.c:read_socket_data(342)
Apr 8 11:19:17 rohan smbd[5095]: read_socket_data: recv failure for 4. Error = Connection reset by peer
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54415 DF PROTO=TCP SPT=3866 DPT=139 WINDOW=65535 RES=0x00 ACK URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54416 DF PROTO=TCP SPT=3863 DPT=445 WINDOW=65535 RES=0x00 ACK URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54417 DF PROTO=TCP SPT=3866 DPT=139 WINDOW=0 RES=0x00 RST URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=242 TOS=0x00 PREC=0x00 TTL=127 ID=54418 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65404 RES=0x00 ACK PSH URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=276 TOS=0x00 PREC=0x00 TTL=127 ID=54419 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65152 RES=0x00 ACK PSH URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=134 TOS=0x00 PREC=0x00 TTL=127 ID=54420 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65042 RES=0x00 ACK PSH URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=140 TOS=0x00 PREC=0x00 TTL=127 ID=54440 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64990 RES=0x00 ACK PSH URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=200 TOS=0x00 PREC=0x00 TTL=127 ID=54441 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64883 RES=0x00 ACK PSH URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=208 TOS=0x00 PREC=0x00 TTL=127 ID=54442 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64755 RES=0x00 ACK PSH URGP=0
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=180 TOS=0x00 PREC=0x00 TTL=127 ID=54445 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64647 RES=0x00 ACK PSH URGP=0
Apr 8 11:19:17 rohan smbd[5098]: [2004/04/08 11:19:17, 0, pid=5098, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952)
Apr 8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=202 TOS=0x00 PREC=0x00 TTL=127 ID=54446 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64459 RES=0x00 ACK PSH URGP=0
Apr 8 11:19:17 rohan smbd[5100]: [2004/04/08 11:19:17, 0, pid=5100, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952)
Apr 8 11:19:17 rohan smbd[5102]: [2004/04/08 11:19:17, 0, pid=5102, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952)
Apr 8 11:19:17 rohan smbd[5098]: getpeername failed. Error was Transport endpoint is not connected
Apr 8 11:19:17 rohan smbd[5120]: [2004/04/08 11:19:17, 0, pid=5120, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952)
[...]
When I saw this, I disconnected x.y.16.19 from the network (it seems to be infected by a virus) and everything returned to normal.
I know the real problem is the client that is infected, but I don't think it's a normal behaviour
for samba to FREEZE a server because of such an event. Any clue of what's happening and if there
is a fix for samba ? Why the "max smbd processes" directive isn't respected ? I don't really
want my server to die every time there is some windows machines infected on the campus (more
than 2000 computers). The virus seems to be W32.spybot.worm.gen .
Thanks and have a nice day,
--
Mathieu Legaré, analyste en informatique (réseau/système)
Service de soutien pédagogique et technologique
Université du Québec a Trois-Rivières
Courriel : legare at uqtr.ca
PGP : http://www.uqtr.ca/~legare/public.pgp
More information about the samba
mailing list