Process smbd using 100% CPU and imposible to kill

Cedric Simon cedric at solucionjava.com
Mon Feb 16 11:11:58 MST 2009


Hello,

We have recently installed a Samba server on OpenSuse 11.1 and we have
the following problem: after some time, a smbd process starts using 100%
of the CPU, and it is impossible to kill it, event with a kill -9 pid.

The Samba service can be stop/started, but the smbd process keep using
100% CPU. Shutdown does not work either. Only a power off of the server
can 'solve' the problem.

Please fin below my findings and info.

The pid is running as root instead of admon, and the running time = time
since user (IP 192.168.1.67) actually disconected from Samba. I assume
something is wrong while closing the process, and the process enter in
an unstable/phantom status, using 100% of CPU.

As CPU is used 100%, it affects the whole server :-(((

If you have any idea of what could be wrong/solve this problem, feel
free to tell me. Also if you need some more info, if I can get it I'll
send it to you.

As the server is in prod at a client's site, I can do 'what I want' with
the server. We are investigating of moving the Windows clients to NFS.

Please note most users, ie. the IP 192.168.1.67, are using wireless
conection, and is some case can loose the network. This might be part of
the problem.

But my major concern is how can Linux have a process running (smbd) that
is impossible to kill and prohibit shutdown of the server, as well as
'normal' operation, since it uses 100% CPU.

Many thanks in advance for your help.

Cedric Simon.


smb.conf

[global]
workgroup = MEDLAB
server string = Servidor de archivos de Medlab
map to guest = Bad User
null passwords = Yes
guest account = samba
printcap name = cups
ldap ssl = no
create mask = 0777
force create mode = 0777
force security mode = 0777
directory mask = 0777
force directory mode = 0777
force directory security mode = 0777
cups options = raw

[users]
comment = All users
path = /shared
read only = No
inherit acls = Yes
veto files = /aquota.user/groups/shares/

[admon]
comment = Administracion
path = /shared/admon
read only = No
inherit acls = Yes
veto files = /aquota.user/groups/shares/

[clientes]
comment = Clientes
path = /shared/clientes
read only = No
inherit acls = Yes
veto files = /aquota.user/groups/shares/

[gerencia]
comment = Gerencia
path = /shared/gerencia
read only = No
inherit acls = Yes
veto files = /aquota.user/groups/shares/

[medicos]
comment = Medicos
path = /shared/medicos/
inherit acls = yes
veto files = /aquota.user/groups/shares/
guest ok = yes
read only = no


[compartido]
comment = All groups
path = /shared/compartido/
username = samba
read only = No
acl check permissions = No
force unknown acl user = Yes
guest ok = Yes
hosts allow = 192.168.1.


relih:~ # top
top - 12:35:00 up 19:40,  1 user,  load average: 3.01, 2.92, 2.33
Tasks: 132 total,   4 running, 128 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 25.0%sy,  0.0%ni, 74.5%id,  0.3%wa,  0.0%hi,  0.2%si,
0.0%st
Mem:   2048884k total,  1996896k used,    51988k free,    98664k buffers
Swap:  2104504k total,       28k used,  2104476k free,  1506752k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11878 root      20   0 16984 5188 3932 R  100  0.3  37:24.13 smbd
14763 root      20   0  2432 1132  848 R    1  0.1   0:00.04 top
    1 root      20   0  1008  380  332 S    0  0.0   0:02.00 init
    2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd
    3 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/0
    4 root      15  -5     0    0    0 S    0  0.0   0:00.84 ksoftirqd/0
    5 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/1

log.smbd:

[2009/02/14 03:45:15,  0] smbd/server.c:main(1208)
  smbd version 3.2.6-0.3.1-2042-SUSE-CODE11 started.
  Copyright Andrew Tridgell and the Samba Team 1992-2008
[2009/02/14 07:25:36,  1] smbd/service.c:make_connection_snum(1194)
  nadia (::ffff:192.168.1.104) connect to service admon initially as
user admon (uid=1002, gid=100) (pid 11622)
[2009/02/14 07:34:17,  1] smbd/service.c:make_connection_snum(1194)
  lenovo_medicos (::ffff:192.168.1.80) connect to service medicos
initially as user medicos (uid=1004, gid=100) (pid 11651)
[2009/02/14 07:34:17,  1] smbd/service.c:make_connection_snum(1194)
  lenovo_medicos (::ffff:192.168.1.80) connect to service compartido
initially as user medicos (uid=1004, gid=100) (pid 11651)
[2009/02/14 07:42:03,  1] smbd/service.c:make_connection_snum(1194)
  recepcion (::ffff:192.168.1.112) connect to service compartido
initially as user admon (uid=1002, gid=100) (pid 11661)
[2009/02/14 07:49:43,  1] smbd/service.c:make_connection_snum(1194)
  contabilidad (::ffff:192.168.1.68) connect to service admon initially
as user admon (uid=1002, gid=100) (pid 11694)
[2009/02/14 07:49:43,  1] smbd/service.c:make_connection_snum(1194)
  contabilidad (::ffff:192.168.1.68) connect to service compartido
initially as user admon (uid=1002, gid=100) (pid 11694)
[2009/02/14 07:49:44,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
[2009/02/14 07:49:44,  0] lib/util_sock.c:read_socket_with_timeout(939)
[2009/02/14 07:49:44,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
  read_socket_with_timeout: client 0.0.0.0 read error = Conexión
reinicializada por la máquina remota.
[2009/02/14 07:53:58,  1] smbd/service.c:make_connection_snum(1194)
  direccion_medic (::ffff:192.168.1.70) connect to service medicos
initially as user medicos (uid=1004, gid=100) (pid 11698)
[2009/02/14 07:53:58,  1] smbd/service.c:make_connection_snum(1194)
  direccion_medic (::ffff:192.168.1.70) connect to service compartido
initially as user medicos (uid=1004, gid=100) (pid 11698)
[2009/02/14 07:54:06,  1] smbd/service.c:make_connection_snum(1194)
  server_medlab (::ffff:192.168.1.71) connect to service clientes
initially as user clientes (uid=1006, gid=100) (pid 11700)
[2009/02/14 07:54:06,  1] smbd/service.c:make_connection_snum(1194)
  server_medlab (::ffff:192.168.1.71) connect to service users initially
as user clientes (uid=1006, gid=100) (pid 11700)
[2009/02/14 08:05:12,  1] smbd/service.c:make_connection_snum(1194)
  emma (::ffff:192.168.1.162) connect to service medicos initially as
user medicos (uid=1004, gid=100) (pid 11739)
[2009/02/14 08:05:12,  1] smbd/service.c:make_connection_snum(1194)
  emma (::ffff:192.168.1.162) connect to service compartido initially as
user medicos (uid=1004, gid=100) (pid 11739)
[2009/02/14 08:05:15,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
[2009/02/14 08:05:15,  0] lib/util_sock.c:read_socket_with_timeout(939)
[2009/02/14 08:05:15,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
  read_socket_with_timeout: client 0.0.0.0 read error = Conexión
reinicializada por la máquina remota.
[2009/02/14 08:09:42,  0] lib/util_sock.c:read_socket_with_timeout(939)
[2009/02/14 08:09:42,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
  read_socket_with_timeout: client 0.0.0.0 read error = Conexión
reinicializada por la máquina remota.
[2009/02/14 08:09:42,  1] smbd/service.c:close_cnum(1405)
  contabilidad (::ffff:192.168.1.68) closed connection to service
compartido
[2009/02/14 08:09:42,  1] smbd/service.c:close_cnum(1405)
  contabilidad (::ffff:192.168.1.68) closed connection to service admon
[2009/02/14 08:15:07,  1] smbd/service.c:make_connection_snum(1194)
  contabilidad (::ffff:192.168.1.68) connect to service admon initially
as user admon (uid=1002, gid=100) (pid 11772)
[2009/02/14 08:20:06,  0] lib/util_sock.c:read_socket_with_timeout(939)
[2009/02/14 08:20:06,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
  read_socket_with_timeout: client 0.0.0.0 read error = Conexión
reinicializada por la máquina remota.
[2009/02/14 08:20:06,  1] smbd/service.c:close_cnum(1405)
  contabilidad (::ffff:192.168.1.68) closed connection to service admon
[2009/02/14 08:22:57,  1] smbd/service.c:make_connection_snum(1194)
  medlab (::ffff:192.168.1.67) connect to service admon initially as
user admon (uid=1002, gid=100) (pid 11787)
[2009/02/14 08:23:04,  1] smbd/service.c:make_connection_snum(1194)
  medlab (::ffff:192.168.1.67) connect to service compartido initially
as user admon (uid=1002, gid=100) (pid 11787)
[2009/02/14 08:24:08,  1] smbd/service.c:make_connection_snum(1194)
  contabilidad (::ffff:192.168.1.68) connect to service admon initially
as user admon (uid=1002, gid=100) (pid 11791)
[2009/02/14 08:24:11,  0] lib/util_sock.c:read_socket_with_timeout(939)
[2009/02/14 08:24:11,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
  read_socket_with_timeout: client 0.0.0.0 read error = Conexión
reinicializada por la máquina remota.
[2009/02/14 08:30:35,  0] lib/util_sock.c:read_socket_with_timeout(939)
[2009/02/14 08:30:35,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
  read_socket_with_timeout: client 0.0.0.0 read error = Conexión
reinicializada por la máquina remota.
[2009/02/14 08:30:35,  1] smbd/service.c:close_cnum(1405)
  contabilidad (::ffff:192.168.1.68) closed connection to service admon
[2009/02/14 08:32:15,  1] smbd/service.c:make_connection_snum(1194)
  contabilidad (::ffff:192.168.1.68) connect to service admon initially
as user admon (uid=1002, gid=100) (pid 11819)
[2009/02/14 08:38:48,  1] smbd/service.c:close_cnum(1405)
  medlab (::ffff:192.168.1.67) closed connection to service admon
[2009/02/14 08:38:48,  1] smbd/service.c:close_cnum(1405)
  medlab (::ffff:192.168.1.67) closed connection to service compartido
[2009/02/14 08:44:47,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
[2009/02/14 08:44:47,  0] lib/util_sock.c:read_socket_with_timeout(939)
[2009/02/14 08:44:47,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
  read_socket_with_timeout: client 0.0.0.0 read error = Conexión
reinicializada por la máquina remota.
[2009/02/14 08:46:28,  1] smbd/service.c:make_connection_snum(1194)
  medlab (::ffff:192.168.1.67) connect to service admon initially as
user admon (uid=1002, gid=100) (pid 11868)
[2009/02/14 08:46:28,  1] smbd/service.c:make_connection_snum(1194)
  medlab (::ffff:192.168.1.67) connect to service compartido initially
as user admon (uid=1002, gid=100) (pid 11868)
[2009/02/14 08:47:33,  0] lib/util_sock.c:read_socket_with_timeout(939)
[2009/02/14 08:47:33,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
  read_socket_with_timeout: client 0.0.0.0 read error = Conexión
reinicializada por la máquina remota.
[2009/02/14 08:47:33,  1] smbd/service.c:close_cnum(1405)
  contabilidad (::ffff:192.168.1.68) closed connection to service admon
[2009/02/14 08:47:46,  0] lib/util_sock.c:read_socket_with_timeout(939)
[2009/02/14 08:47:46,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
  read_socket_with_timeout: client 0.0.0.0 read error = Conexión
reinicializada por la máquina remota.
[2009/02/14 08:47:46,  1] smbd/service.c:make_connection_snum(1194)
  contabilidad (::ffff:192.168.1.68) connect to service admon initially
as user admon (uid=1002, gid=100) (pid 11873)
[2009/02/14 08:50:18,  1] smbd/service.c:close_cnum(1405)
  medlab (::ffff:192.168.1.67) closed connection to service admon
[2009/02/14 08:50:18,  1] smbd/service.c:close_cnum(1405)
  medlab (::ffff:192.168.1.67) closed connection to service compartido
[2009/02/14 08:51:20,  1] smbd/service.c:make_connection_snum(1194)
  medlab (::ffff:192.168.1.67) connect to service admon initially as
user admon (uid=1002, gid=100) (pid 11878)
[2009/02/14 08:51:20,  1] smbd/service.c:make_connection_snum(1194)
  medlab (::ffff:192.168.1.67) connect to service compartido initially
as user admon (uid=1002, gid=100) (pid 11878)
[2009/02/14 09:37:46,  0] lib/util_sock.c:read_socket_with_timeout(939)
[2009/02/14 09:37:46,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
  read_socket_with_timeout: client 0.0.0.0 read error = Conexión
reinicializada por la máquina remota.
[2009/02/14 09:37:46,  1] smbd/service.c:close_cnum(1405)
  contabilidad (::ffff:192.168.1.68) closed connection to service admon
[2009/02/14 10:06:23,  1] smbd/service.c:make_connection_snum(1194)
  contabilidad (::ffff:192.168.1.68) connect to service admon initially
as user admon (uid=1002, gid=100) (pid 14293)
[2009/02/14 10:06:26,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
[2009/02/14 10:06:26,  0] lib/util_sock.c:read_socket_with_timeout(939)
[2009/02/14 10:06:26,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
  read_socket_with_timeout: client 0.0.0.0 read error = Conexión
reinicializada por la máquina remota.
[2009/02/14 11:47:29,  1] smbd/service.c:close_cnum(1405)
  nadia (::ffff:192.168.1.104) closed connection to service admon
[2009/02/14 11:53:07,  1] smbd/notify_inotify.c:watch_destructor(351)
  inotify_rm_watch returned Argumento inválido
[2009/02/14 11:53:11,  1] smbd/notify_inotify.c:watch_destructor(351)
  inotify_rm_watch returned Argumento inválido
[2009/02/14 11:53:24,  1] smbd/notify_inotify.c:watch_destructor(351)
  inotify_rm_watch returned Argumento inválido
[2009/02/14 11:53:27,  1] smbd/notify_inotify.c:watch_destructor(351)
  inotify_rm_watch returned Argumento inválido
[2009/02/14 11:55:58,  1] smbd/service.c:close_cnum(1405)
  recepcion (::ffff:192.168.1.112) closed connection to service
compartido
[2009/02/14 11:57:54,  1] smbd/notify_inotify.c:watch_destructor(351)
  inotify_rm_watch returned Argumento inválido
[2009/02/14 11:57:58,  1] smbd/service.c:close_cnum(1405)
  medlab (::ffff:192.168.1.67) closed connection to service compartido
[2009/02/14 11:57:58,  1] smbd/service.c:close_cnum(1405)
  medlab (::ffff:192.168.1.67) closed connection to service admon
[2009/02/14 12:01:14,  1] smbd/service.c:close_cnum(1405)
  lenovo_medicos (::ffff:192.168.1.80) closed connection to service
medicos
[2009/02/14 12:01:14,  1] smbd/service.c:close_cnum(1405)
  lenovo_medicos (::ffff:192.168.1.80) closed connection to service
compartido
[2009/02/14 12:03:56,  1] smbd/service.c:close_cnum(1405)
  emma (::ffff:192.168.1.162) closed connection to service medicos
[2009/02/14 12:03:56,  1] smbd/service.c:close_cnum(1405)
  emma (::ffff:192.168.1.162) closed connection to service compartido
[2009/02/14 12:05:15,  1] smbd/service.c:make_connection_snum(1194)
  ar (::ffff:192.168.1.54) connect to service medicos initially as user
gerencia (uid=1003, gid=100) (pid 14639)
[2009/02/14 12:05:15,  1] smbd/service.c:make_connection_snum(1194)
  ar (::ffff:192.168.1.54) connect to service admon initially as user
gerencia (uid=1003, gid=100) (pid 14639)
[2009/02/14 12:05:15,  1] smbd/service.c:make_connection_snum(1194)
  ar (::ffff:192.168.1.54) connect to service gerencia initially as user
gerencia (uid=1003, gid=100) (pid 14639)
[2009/02/14 12:05:15,  1] smbd/service.c:make_connection_snum(1194)
  ar (::ffff:192.168.1.54) connect to service clientes initially as user
gerencia (uid=1003, gid=100) (pid 14639)
[2009/02/14 12:05:15,  1] smbd/service.c:make_connection_snum(1194)
  ar (::ffff:192.168.1.54) connect to service compartido initially as
user gerencia (uid=1003, gid=100) (pid 14639)
[2009/02/14 12:05:18,  0] lib/util_sock.c:read_socket_with_timeout(939)
[2009/02/14 12:05:18,  0] lib/util_sock.c:get_peer_addr_internal(1607)
  getpeername failed. Error was El otro extremo de la conexión no está
conectado
  read_socket_with_timeout: client 0.0.0.0 read error = Conexión
reinicializada por la máquina remota.
[2009/02/14 12:08:26,  1] smbd/service.c:close_cnum(1405)
  server_medlab (::ffff:192.168.1.71) closed connection to service
clientes
[2009/02/14 12:08:26,  1] smbd/service.c:close_cnum(1405)
  server_medlab (::ffff:192.168.1.71) closed connection to service users



relih:~ # rcsmb stop
Shutting down Samba SMB daemon  Warning: daemon not running.
done
relih:~ # rcsmb start
Starting Samba SMB daemon
done
relih:~ # smbstatus

Samba version 3.2.6-0.3.1-2042-SUSE-CODE11
PID     Username      Group         Machine
-------------------------------------------------------------------

Service      pid     machine       Connected at
-------------------------------------------------------

No locked files

relih:~ # kill -9 11878
relih:~ # top
top - 17:31:07 up 1 day, 36 min,  1 user,  load average: 10.00, 10.00,
9.94
Tasks: 129 total,   4 running, 124 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us, 25.0%sy,  0.0%ni, 75.0%id,  0.0%wa,  0.0%hi,  0.0%si,
0.0%st
Mem:   2048884k total,  2035512k used,    13372k free,    84460k buffers
Swap:  2104504k total,       28k used,  2104476k free,  1553572k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
11878 root      20   0 16984 5188 3932 R  101  0.3 333:32.09 smbd
    1 root      20   0  1008  380  332 S    0  0.0   0:02.04 init
    2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd


More information about the samba-technical mailing list