Process smbd using 100% CPU and impossible to kill
David Collier-Brown
davec-b at rogers.com
Mon Feb 16 12:25:19 MST 2009
Cedric Simon wrote:
> Hello,
>
> We have recently installed a Samba server on OpenSuse 11.1 and we have
> the following problem: after some time, a smbd process starts using 100%
> of the CPU, and it is impossible to kill it, event with a kill -9 pid.
>
> The Samba service can be stop/started, but the smbd process keep using
> 100% CPU. Shutdown does not work either. Only a power off of the server
> can 'solve' the problem.
>
You've tripped over a low-level problem of some sort which is doing a
denial-of-service
attack on Samba, and therefor on everyone else (;-))
If its always the same user that triggers the problem, or if the first
user who logs on will
trigger it, you can attach a debugger to the samba process, induce the
problem and
tell the samba folks where it died, which *may* give you a clue about
what failed.
If not, you can run strace on it and see if it loops on a system call.
Failing that, I'd try swapping parts (:-().
--dave
> Please fin below my findings and info.
>
> The pid is running as root instead of admon, and the running time = time
> since user (IP 192.168.1.67) actually disconected from Samba. I assume
> something is wrong while closing the process, and the process enter in
> an unstable/phantom status, using 100% of CPU.
>
> As CPU is used 100%, it affects the whole server :-(((
>
> If you have any idea of what could be wrong/solve this problem, feel
> free to tell me. Also if you need some more info, if I can get it I'll
> send it to you.
>
> As the server is in prod at a client's site, I can do 'what I want' with
> the server. We are investigating of moving the Windows clients to NFS.
>
> Please note most users, ie. the IP 192.168.1.67, are using wireless
> conection, and is some case can loose the network. This might be part of
> the problem.
>
> But my major concern is how can Linux have a process running (smbd) that
> is impossible to kill and prohibit shutdown of the server, as well as
> 'normal' operation, since it uses 100% CPU.
>
> Many thanks in advance for your help.
>
> Cedric Simon.
>
>
> smb.conf
>
> [global]
> workgroup = MEDLAB
> server string = Servidor de archivos de Medlab
> map to guest = Bad User
> null passwords = Yes
> guest account = samba
> printcap name = cups
> ldap ssl = no
> create mask = 0777
> force create mode = 0777
> force security mode = 0777
> directory mask = 0777
> force directory mode = 0777
> force directory security mode = 0777
> cups options = raw
>
> [users]
> comment = All users
> path = /shared
> read only = No
> inherit acls = Yes
> veto files = /aquota.user/groups/shares/
>
> [admon]
> comment = Administracion
> path = /shared/admon
> read only = No
> inherit acls = Yes
> veto files = /aquota.user/groups/shares/
>
> [clientes]
> comment = Clientes
> path = /shared/clientes
> read only = No
> inherit acls = Yes
> veto files = /aquota.user/groups/shares/
>
> [gerencia]
> comment = Gerencia
> path = /shared/gerencia
> read only = No
> inherit acls = Yes
> veto files = /aquota.user/groups/shares/
>
> [medicos]
> comment = Medicos
> path = /shared/medicos/
> inherit acls = yes
> veto files = /aquota.user/groups/shares/
> guest ok = yes
> read only = no
>
>
> [compartido]
> comment = All groups
> path = /shared/compartido/
> username = samba
> read only = No
> acl check permissions = No
> force unknown acl user = Yes
> guest ok = Yes
> hosts allow = 192.168.1.
>
>
> relih:~ # top
> top - 12:35:00 up 19:40, 1 user, load average: 3.01, 2.92, 2.33
> Tasks: 132 total, 4 running, 128 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.0%us, 25.0%sy, 0.0%ni, 74.5%id, 0.3%wa, 0.0%hi, 0.2%si,
> 0.0%st
> Mem: 2048884k total, 1996896k used, 51988k free, 98664k buffers
> Swap: 2104504k total, 28k used, 2104476k free, 1506752k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 11878 root 20 0 16984 5188 3932 R 100 0.3 37:24.13 smbd
> 14763 root 20 0 2432 1132 848 R 1 0.1 0:00.04 top
> 1 root 20 0 1008 380 332 S 0 0.0 0:02.00 init
> 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
> 3 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/0
> 4 root 15 -5 0 0 0 S 0 0.0 0:00.84 ksoftirqd/0
> 5 root RT -5 0 0 0 S 0 0.0 0:00.00 migration/1
>
> log.smbd:
>
> [2009/02/14 03:45:15, 0] smbd/server.c:main(1208)
> smbd version 3.2.6-0.3.1-2042-SUSE-CODE11 started.
> Copyright Andrew Tridgell and the Samba Team 1992-2008
> [2009/02/14 07:25:36, 1] smbd/service.c:make_connection_snum(1194)
> nadia (::ffff:192.168.1.104) connect to service admon initially as
> user admon (uid=1002, gid=100) (pid 11622)
> [2009/02/14 07:34:17, 1] smbd/service.c:make_connection_snum(1194)
> lenovo_medicos (::ffff:192.168.1.80) connect to service medicos
> initially as user medicos (uid=1004, gid=100) (pid 11651)
> [2009/02/14 07:34:17, 1] smbd/service.c:make_connection_snum(1194)
> lenovo_medicos (::ffff:192.168.1.80) connect to service compartido
> initially as user medicos (uid=1004, gid=100) (pid 11651)
> [2009/02/14 07:42:03, 1] smbd/service.c:make_connection_snum(1194)
> recepcion (::ffff:192.168.1.112) connect to service compartido
> initially as user admon (uid=1002, gid=100) (pid 11661)
> [2009/02/14 07:49:43, 1] smbd/service.c:make_connection_snum(1194)
> contabilidad (::ffff:192.168.1.68) connect to service admon initially
> as user admon (uid=1002, gid=100) (pid 11694)
> [2009/02/14 07:49:43, 1] smbd/service.c:make_connection_snum(1194)
> contabilidad (::ffff:192.168.1.68) connect to service compartido
> initially as user admon (uid=1002, gid=100) (pid 11694)
> [2009/02/14 07:49:44, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> [2009/02/14 07:49:44, 0] lib/util_sock.c:read_socket_with_timeout(939)
> [2009/02/14 07:49:44, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> read_socket_with_timeout: client 0.0.0.0 read error = Conexión
> reinicializada por la máquina remota.
> [2009/02/14 07:53:58, 1] smbd/service.c:make_connection_snum(1194)
> direccion_medic (::ffff:192.168.1.70) connect to service medicos
> initially as user medicos (uid=1004, gid=100) (pid 11698)
> [2009/02/14 07:53:58, 1] smbd/service.c:make_connection_snum(1194)
> direccion_medic (::ffff:192.168.1.70) connect to service compartido
> initially as user medicos (uid=1004, gid=100) (pid 11698)
> [2009/02/14 07:54:06, 1] smbd/service.c:make_connection_snum(1194)
> server_medlab (::ffff:192.168.1.71) connect to service clientes
> initially as user clientes (uid=1006, gid=100) (pid 11700)
> [2009/02/14 07:54:06, 1] smbd/service.c:make_connection_snum(1194)
> server_medlab (::ffff:192.168.1.71) connect to service users initially
> as user clientes (uid=1006, gid=100) (pid 11700)
> [2009/02/14 08:05:12, 1] smbd/service.c:make_connection_snum(1194)
> emma (::ffff:192.168.1.162) connect to service medicos initially as
> user medicos (uid=1004, gid=100) (pid 11739)
> [2009/02/14 08:05:12, 1] smbd/service.c:make_connection_snum(1194)
> emma (::ffff:192.168.1.162) connect to service compartido initially as
> user medicos (uid=1004, gid=100) (pid 11739)
> [2009/02/14 08:05:15, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> [2009/02/14 08:05:15, 0] lib/util_sock.c:read_socket_with_timeout(939)
> [2009/02/14 08:05:15, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> read_socket_with_timeout: client 0.0.0.0 read error = Conexión
> reinicializada por la máquina remota.
> [2009/02/14 08:09:42, 0] lib/util_sock.c:read_socket_with_timeout(939)
> [2009/02/14 08:09:42, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> read_socket_with_timeout: client 0.0.0.0 read error = Conexión
> reinicializada por la máquina remota.
> [2009/02/14 08:09:42, 1] smbd/service.c:close_cnum(1405)
> contabilidad (::ffff:192.168.1.68) closed connection to service
> compartido
> [2009/02/14 08:09:42, 1] smbd/service.c:close_cnum(1405)
> contabilidad (::ffff:192.168.1.68) closed connection to service admon
> [2009/02/14 08:15:07, 1] smbd/service.c:make_connection_snum(1194)
> contabilidad (::ffff:192.168.1.68) connect to service admon initially
> as user admon (uid=1002, gid=100) (pid 11772)
> [2009/02/14 08:20:06, 0] lib/util_sock.c:read_socket_with_timeout(939)
> [2009/02/14 08:20:06, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> read_socket_with_timeout: client 0.0.0.0 read error = Conexión
> reinicializada por la máquina remota.
> [2009/02/14 08:20:06, 1] smbd/service.c:close_cnum(1405)
> contabilidad (::ffff:192.168.1.68) closed connection to service admon
> [2009/02/14 08:22:57, 1] smbd/service.c:make_connection_snum(1194)
> medlab (::ffff:192.168.1.67) connect to service admon initially as
> user admon (uid=1002, gid=100) (pid 11787)
> [2009/02/14 08:23:04, 1] smbd/service.c:make_connection_snum(1194)
> medlab (::ffff:192.168.1.67) connect to service compartido initially
> as user admon (uid=1002, gid=100) (pid 11787)
> [2009/02/14 08:24:08, 1] smbd/service.c:make_connection_snum(1194)
> contabilidad (::ffff:192.168.1.68) connect to service admon initially
> as user admon (uid=1002, gid=100) (pid 11791)
> [2009/02/14 08:24:11, 0] lib/util_sock.c:read_socket_with_timeout(939)
> [2009/02/14 08:24:11, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> read_socket_with_timeout: client 0.0.0.0 read error = Conexión
> reinicializada por la máquina remota.
> [2009/02/14 08:30:35, 0] lib/util_sock.c:read_socket_with_timeout(939)
> [2009/02/14 08:30:35, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> read_socket_with_timeout: client 0.0.0.0 read error = Conexión
> reinicializada por la máquina remota.
> [2009/02/14 08:30:35, 1] smbd/service.c:close_cnum(1405)
> contabilidad (::ffff:192.168.1.68) closed connection to service admon
> [2009/02/14 08:32:15, 1] smbd/service.c:make_connection_snum(1194)
> contabilidad (::ffff:192.168.1.68) connect to service admon initially
> as user admon (uid=1002, gid=100) (pid 11819)
> [2009/02/14 08:38:48, 1] smbd/service.c:close_cnum(1405)
> medlab (::ffff:192.168.1.67) closed connection to service admon
> [2009/02/14 08:38:48, 1] smbd/service.c:close_cnum(1405)
> medlab (::ffff:192.168.1.67) closed connection to service compartido
> [2009/02/14 08:44:47, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> [2009/02/14 08:44:47, 0] lib/util_sock.c:read_socket_with_timeout(939)
> [2009/02/14 08:44:47, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> read_socket_with_timeout: client 0.0.0.0 read error = Conexión
> reinicializada por la máquina remota.
> [2009/02/14 08:46:28, 1] smbd/service.c:make_connection_snum(1194)
> medlab (::ffff:192.168.1.67) connect to service admon initially as
> user admon (uid=1002, gid=100) (pid 11868)
> [2009/02/14 08:46:28, 1] smbd/service.c:make_connection_snum(1194)
> medlab (::ffff:192.168.1.67) connect to service compartido initially
> as user admon (uid=1002, gid=100) (pid 11868)
> [2009/02/14 08:47:33, 0] lib/util_sock.c:read_socket_with_timeout(939)
> [2009/02/14 08:47:33, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> read_socket_with_timeout: client 0.0.0.0 read error = Conexión
> reinicializada por la máquina remota.
> [2009/02/14 08:47:33, 1] smbd/service.c:close_cnum(1405)
> contabilidad (::ffff:192.168.1.68) closed connection to service admon
> [2009/02/14 08:47:46, 0] lib/util_sock.c:read_socket_with_timeout(939)
> [2009/02/14 08:47:46, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> read_socket_with_timeout: client 0.0.0.0 read error = Conexión
> reinicializada por la máquina remota.
> [2009/02/14 08:47:46, 1] smbd/service.c:make_connection_snum(1194)
> contabilidad (::ffff:192.168.1.68) connect to service admon initially
> as user admon (uid=1002, gid=100) (pid 11873)
> [2009/02/14 08:50:18, 1] smbd/service.c:close_cnum(1405)
> medlab (::ffff:192.168.1.67) closed connection to service admon
> [2009/02/14 08:50:18, 1] smbd/service.c:close_cnum(1405)
> medlab (::ffff:192.168.1.67) closed connection to service compartido
> [2009/02/14 08:51:20, 1] smbd/service.c:make_connection_snum(1194)
> medlab (::ffff:192.168.1.67) connect to service admon initially as
> user admon (uid=1002, gid=100) (pid 11878)
> [2009/02/14 08:51:20, 1] smbd/service.c:make_connection_snum(1194)
> medlab (::ffff:192.168.1.67) connect to service compartido initially
> as user admon (uid=1002, gid=100) (pid 11878)
> [2009/02/14 09:37:46, 0] lib/util_sock.c:read_socket_with_timeout(939)
> [2009/02/14 09:37:46, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> read_socket_with_timeout: client 0.0.0.0 read error = Conexión
> reinicializada por la máquina remota.
> [2009/02/14 09:37:46, 1] smbd/service.c:close_cnum(1405)
> contabilidad (::ffff:192.168.1.68) closed connection to service admon
> [2009/02/14 10:06:23, 1] smbd/service.c:make_connection_snum(1194)
> contabilidad (::ffff:192.168.1.68) connect to service admon initially
> as user admon (uid=1002, gid=100) (pid 14293)
> [2009/02/14 10:06:26, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> [2009/02/14 10:06:26, 0] lib/util_sock.c:read_socket_with_timeout(939)
> [2009/02/14 10:06:26, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> read_socket_with_timeout: client 0.0.0.0 read error = Conexión
> reinicializada por la máquina remota.
> [2009/02/14 11:47:29, 1] smbd/service.c:close_cnum(1405)
> nadia (::ffff:192.168.1.104) closed connection to service admon
> [2009/02/14 11:53:07, 1] smbd/notify_inotify.c:watch_destructor(351)
> inotify_rm_watch returned Argumento inválido
> [2009/02/14 11:53:11, 1] smbd/notify_inotify.c:watch_destructor(351)
> inotify_rm_watch returned Argumento inválido
> [2009/02/14 11:53:24, 1] smbd/notify_inotify.c:watch_destructor(351)
> inotify_rm_watch returned Argumento inválido
> [2009/02/14 11:53:27, 1] smbd/notify_inotify.c:watch_destructor(351)
> inotify_rm_watch returned Argumento inválido
> [2009/02/14 11:55:58, 1] smbd/service.c:close_cnum(1405)
> recepcion (::ffff:192.168.1.112) closed connection to service
> compartido
> [2009/02/14 11:57:54, 1] smbd/notify_inotify.c:watch_destructor(351)
> inotify_rm_watch returned Argumento inválido
> [2009/02/14 11:57:58, 1] smbd/service.c:close_cnum(1405)
> medlab (::ffff:192.168.1.67) closed connection to service compartido
> [2009/02/14 11:57:58, 1] smbd/service.c:close_cnum(1405)
> medlab (::ffff:192.168.1.67) closed connection to service admon
> [2009/02/14 12:01:14, 1] smbd/service.c:close_cnum(1405)
> lenovo_medicos (::ffff:192.168.1.80) closed connection to service
> medicos
> [2009/02/14 12:01:14, 1] smbd/service.c:close_cnum(1405)
> lenovo_medicos (::ffff:192.168.1.80) closed connection to service
> compartido
> [2009/02/14 12:03:56, 1] smbd/service.c:close_cnum(1405)
> emma (::ffff:192.168.1.162) closed connection to service medicos
> [2009/02/14 12:03:56, 1] smbd/service.c:close_cnum(1405)
> emma (::ffff:192.168.1.162) closed connection to service compartido
> [2009/02/14 12:05:15, 1] smbd/service.c:make_connection_snum(1194)
> ar (::ffff:192.168.1.54) connect to service medicos initially as user
> gerencia (uid=1003, gid=100) (pid 14639)
> [2009/02/14 12:05:15, 1] smbd/service.c:make_connection_snum(1194)
> ar (::ffff:192.168.1.54) connect to service admon initially as user
> gerencia (uid=1003, gid=100) (pid 14639)
> [2009/02/14 12:05:15, 1] smbd/service.c:make_connection_snum(1194)
> ar (::ffff:192.168.1.54) connect to service gerencia initially as user
> gerencia (uid=1003, gid=100) (pid 14639)
> [2009/02/14 12:05:15, 1] smbd/service.c:make_connection_snum(1194)
> ar (::ffff:192.168.1.54) connect to service clientes initially as user
> gerencia (uid=1003, gid=100) (pid 14639)
> [2009/02/14 12:05:15, 1] smbd/service.c:make_connection_snum(1194)
> ar (::ffff:192.168.1.54) connect to service compartido initially as
> user gerencia (uid=1003, gid=100) (pid 14639)
> [2009/02/14 12:05:18, 0] lib/util_sock.c:read_socket_with_timeout(939)
> [2009/02/14 12:05:18, 0] lib/util_sock.c:get_peer_addr_internal(1607)
> getpeername failed. Error was El otro extremo de la conexión no está
> conectado
> read_socket_with_timeout: client 0.0.0.0 read error = Conexión
> reinicializada por la máquina remota.
> [2009/02/14 12:08:26, 1] smbd/service.c:close_cnum(1405)
> server_medlab (::ffff:192.168.1.71) closed connection to service
> clientes
> [2009/02/14 12:08:26, 1] smbd/service.c:close_cnum(1405)
> server_medlab (::ffff:192.168.1.71) closed connection to service users
>
>
>
> relih:~ # rcsmb stop
> Shutting down Samba SMB daemon Warning: daemon not running.
> done
> relih:~ # rcsmb start
> Starting Samba SMB daemon
> done
> relih:~ # smbstatus
>
> Samba version 3.2.6-0.3.1-2042-SUSE-CODE11
> PID Username Group Machine
> -------------------------------------------------------------------
>
> Service pid machine Connected at
> -------------------------------------------------------
>
> No locked files
>
> relih:~ # kill -9 11878
> relih:~ # top
> top - 17:31:07 up 1 day, 36 min, 1 user, load average: 10.00, 10.00,
> 9.94
> Tasks: 129 total, 4 running, 124 sleeping, 0 stopped, 1 zombie
> Cpu(s): 0.0%us, 25.0%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Mem: 2048884k total, 2035512k used, 13372k free, 84460k buffers
> Swap: 2104504k total, 28k used, 2104476k free, 1553572k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 11878 root 20 0 16984 5188 3932 R 101 0.3 333:32.09 smbd
> 1 root 20 0 1008 380 332 S 0 0.0 0:02.04 init
> 2 root 15 -5 0 0 0 S 0 0.0 0:00.00 kthreadd
>
>
--
David Collier-Brown, | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
davecb at spamcop.net | -- Mark Twain
(416) 223-8968
More information about the samba-technical
mailing list