[Samba] getpeername/server crash problem

Mathieu Legare Mathieu_Legare at UQTR.CA
Thu Apr 8 18:31:19 GMT 2004


Hi!

Before anything, here is my hardware/software informations :

Hardware : 

-IBM x345, 1 CPU, 1G RAM, IBM ServeRAID controlle
-6 HD used with LVM, 2 volume group, 12 logical volumes all
running ext3

Software :

-RedHat Linux Enterprise AS (Academic) 3.0 update 1
-Kernel 2.4.21-4.0.2.EL
-samba-3.0.2-6.3E
-Running an apache 2 web server
-On normal use, there is only 10-15 computers "samba" connected to the server 
 with network drives


Recently (this week), I start having problem with a samba server. I kept
having like (many times each seconds) :

[...]

Apr  4 00:14:30 rohan smbd[3170]:   write_socket_data: write failure. Error = Connection reset by peer
Apr  4 00:14:30 rohan smbd[3170]: [2004/04/04 00:14:30, 0] lib/util_sock.c:write_socket(413)
Apr  4 00:14:30 rohan smbd[3170]:   write_socket: Error writing 4 bytes to socket 5: ERRNO = Connection reset by peer
Apr  4 00:14:30 rohan smbd[3170]: [2004/04/04 00:14:30, 0] lib/util_sock.c:send_smb(605)
Apr  4 00:14:30 rohan smbd[3170]:   Error writing 4 bytes to client. -1. (Connection reset by peer)
Apr  4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:get_peer_addr(952)
Apr  4 00:46:30 rohan smbd[4201]:   getpeername failed. Error was Transport endpoint is not connected
Apr  4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:get_peer_addr(952)
Apr  4 00:46:30 rohan smbd[4201]:   getpeername failed. Error was Transport endpoint is not connected
Apr  4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:write_socket_data(388)
Apr  4 00:46:30 rohan smbd[4201]:   write_socket_data: write failure. Error = Connection reset by peer
Apr  4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:write_socket(413)
Apr  4 00:46:30 rohan smbd[4201]:   write_socket: Error writing 4 bytes to socket 16: ERRNO = Connection reset by peer
Apr  4 00:46:30 rohan smbd[4201]: [2004/04/04 00:46:30, 0] lib/util_sock.c:send_smb(605)
Apr  4 00:46:30 rohan smbd[4201]:   Error writing 4 bytes to client. -1. (Connection reset by peer)

[...]

At some point, the samba server is going crazy and I have seen up to 11000 "smbd -D" process with a 
wooping load average of 600 !! Needless to say the server was dying and almost frozen and I had to reboot. 
I start monitoring more carefully the server and when the # of process was to high (normally there is 
8-12 smbd process), I have to "killall -9 smbd" and start over. 3 seconds after I was start, I often 
saw 200 process and had to kill it again.


I added some option in smb.conf :

deadtime = 60
debug uid = yes
debug pid = yes
oplocks = no
log level = 1
max connections = 50
max smbd processes = 50
hostname lookups = no
socket options = TCP_NODELAY SO_KEEPALIVE

With no success! I was surprise to see that "max smbd processes = 50" did not prevent samba to grow up to an
amazing number of process very quickly (+1000)

I started to log with iptables what was happening on the IP layer (I logged incoming packets 
matching udp/tcp on port 137/138/139/445). Very quickly, another storm occured and the server
was receiveing A LOT of packets :

[...]

Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=48 TOS=0x00 PREC=0x00 TTL=127 ID=54407 DF PROTO=TCP SPT=3863 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54408 DF PROTO=TCP SPT=3863 DPT=445 WINDOW=65535 RES=0x00 ACK URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54410 DF PROTO=TCP SPT=3863 DPT=445 WINDOW=65535 RES=0x00 ACK FIN URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=48 TOS=0x00 PREC=0x00 TTL=127 ID=54411 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65535 RES=0x00 SYN URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54412 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65535 RES=0x00 ACK URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=48 TOS=0x00 PREC=0x00 TTL=127 ID=54413 DF PROTO=TCP SPT=3866 DPT=139 WINDOW=65535 RES=0x00 SYN URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=177 TOS=0x00 PREC=0x00 TTL=127 ID=54414 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65535 RES=0x00 ACK PSH URGP=0
Apr  8 11:19:17 rohan smbd[5095]: [2004/04/08 11:19:17, 0, pid=5095, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952)
Apr  8 11:19:17 rohan smbd[5095]:   getpeername failed. Error was Transport endpoint is not connected
Apr  8 11:19:17 rohan smbd[5095]: [2004/04/08 11:19:17, 0, pid=5095, effective(0, 0), real(0, 0)] lib/util_sock.c:read_socket_data(342)
Apr  8 11:19:17 rohan smbd[5095]:   read_socket_data: recv failure for 4. Error = Connection reset by peer
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54415 DF PROTO=TCP SPT=3866 DPT=139 WINDOW=65535 RES=0x00 ACK URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54416 DF PROTO=TCP SPT=3863 DPT=445 WINDOW=65535 RES=0x00 ACK URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=40 TOS=0x00 PREC=0x00 TTL=127 ID=54417 DF PROTO=TCP SPT=3866 DPT=139 WINDOW=0 RES=0x00 RST URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=242 TOS=0x00 PREC=0x00 TTL=127 ID=54418 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65404 RES=0x00 ACK PSH URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=276 TOS=0x00 PREC=0x00 TTL=127 ID=54419 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65152 RES=0x00 ACK PSH URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=134 TOS=0x00 PREC=0x00 TTL=127 ID=54420 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=65042 RES=0x00 ACK PSH URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=140 TOS=0x00 PREC=0x00 TTL=127 ID=54440 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64990 RES=0x00 ACK PSH URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=200 TOS=0x00 PREC=0x00 TTL=127 ID=54441 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64883 RES=0x00 ACK PSH URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=208 TOS=0x00 PREC=0x00 TTL=127 ID=54442 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64755 RES=0x00 ACK PSH URGP=0
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=180 TOS=0x00 PREC=0x00 TTL=127 ID=54445 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64647 RES=0x00 ACK PSH URGP=0
Apr  8 11:19:17 rohan smbd[5098]: [2004/04/08 11:19:17, 0, pid=5098, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952)
Apr  8 11:19:17 rohan kernel: IN=eth0 OUT= MAC=00:09:6b:f1:49:1e:00:09:97:56:9a:0e:08:00 SRC=x.y.16.19 DST=x.y.15.3 LEN=202 TOS=0x00 PREC=0x00 TTL=127 ID=54446 DF PROTO=TCP SPT=3865 DPT=445 WINDOW=64459 RES=0x00 ACK PSH URGP=0
Apr  8 11:19:17 rohan smbd[5100]: [2004/04/08 11:19:17, 0, pid=5100, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952)
Apr  8 11:19:17 rohan smbd[5102]: [2004/04/08 11:19:17, 0, pid=5102, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952)
Apr  8 11:19:17 rohan smbd[5098]:   getpeername failed. Error was Transport endpoint is not connected
Apr  8 11:19:17 rohan smbd[5120]: [2004/04/08 11:19:17, 0, pid=5120, effective(0, 0), real(0, 0)] lib/util_sock.c:get_peer_addr(952) 

[...]


When I saw this, I disconnected x.y.16.19 from the network (it seems to be infected by a virus) and everything returned to normal.


I know the real problem is the client that is infected, but I don't think it's a normal behaviour 
for samba to FREEZE a server because of such an event. Any clue of what's happening and if there 
is a fix for samba ? Why the "max smbd processes" directive isn't respected ?  I don't really 
want my server to die every time there is some windows machines infected on the campus (more 
than 2000 computers). The virus seems to be W32.spybot.worm.gen .

Thanks and have a nice day,
-- 
Mathieu Legaré, analyste en informatique (réseau/système)
Service de soutien pédagogique et technologique
Université du Québec a Trois-Rivières
Courriel : legare at uqtr.ca
PGP      : http://www.uqtr.ca/~legare/public.pgp


More information about the samba mailing list