Centos-6 kernel soft lockup CPU#20 stuck for 67s! [smbd-notifyd] after upgrade form 4.2 to 4.4

Dr. Hansjoerg Maurer hansjoerg.maurer at itsd.de
Mon May 2 13:08:43 UTC 2016


Hi

we updated a samba server from 4.2.9 to 4.4.2 last week and got today a total system hang with the following message

kernel: BUG: soft lockup - CPU#20 stuck for 67s! [smbd-notifyd:26402]

The system is runnuning under Centos-6 x86_64 with latest updates installed.
The ext4 filesystem is exported by nfs3  and samba.

I checked the release notes for 4.3.0  and found, that there have been changes in the  FileChangeNotify subsystem

Does anybody else has problems in this area.

We will try it again with

 kernel change notify = no

Regards

Hansjörg

May  2 14:10:05 rmc-cs31 kernel: BUG: soft lockup - CPU#20 stuck for 67s! [smbd-notifyd:26402]
May  2 14:10:05 rmc-cs31 kernel: Modules linked in: dell_rbu ipmi_devintf vfat fat usb_storage mpt3sas mpt2sas scsi_transport_sas raid_class mptctl mptbase nfs fscache nfsd loc
kd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ipv6 vhost_net macvtap macvlan tun kvm_intel kvm uinput power_meter acpi_ipmi ipmi_si ipmi_msghandler microcode iTCO_wdt iTCO_ven
dor_support joydev sg ixgbe dca ptp pps_core mdio bnx2 dcdbas serio_raw lpc_ich mfd_core i7core_edac edac_core ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif wmi pata_acpi at
a_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf]
May  2 14:10:05 rmc-cs31 kernel: CPU 20 
May  2 14:10:05 rmc-cs31 kernel: Modules linked in: dell_rbu ipmi_devintf vfat fat usb_storage mpt3sas mpt2sas scsi_transport_sas raid_class mptctl mptbase nfs fscache nfsd loc
kd nfs_acl auth_rpcgss sunrpc exportfs autofs4 ipv6 vhost_net macvtap macvlan tun kvm_intel kvm uinput power_meter acpi_ipmi ipmi_si ipmi_msghandler microcode iTCO_wdt iTCO_ven
dor_support joydev sg ixgbe dca ptp pps_core mdio bnx2 dcdbas serio_raw lpc_ich mfd_core i7core_edac edac_core ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif wmi pata_acpi at
a_generic ata_piix megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ipmi_devintf]
May  2 14:10:05 rmc-cs31 kernel: 
May  2 14:10:05 rmc-cs31 kernel: Pid: 26402, comm: smbd-notifyd Tainted: G           --L------------    2.6.32-573.18.1.el6.x86_64 #1 Dell Inc. PowerEdge R710/0MD99X
May  2 14:10:05 rmc-cs31 kernel: RIP: 0010:[<ffffffff8153c331>]  [<ffffffff8153c331>] _spin_lock+0x21/0x30
May  2 14:10:05 rmc-cs31 kernel: RSP: 0018:ffff880118d3fe18  EFLAGS: 00000206
May  2 14:10:05 rmc-cs31 kernel: RAX: 0000000000000605 RBX: ffff880118d3fe18 RCX: 0000000000000854
May  2 14:10:05 rmc-cs31 kernel: RDX: 000000000000060a RSI: ffff880c14960100 RDI: ffff880c149603b0
May  2 14:10:05 rmc-cs31 kernel: RBP: ffffffff8100bc0e R08: 0000000000000000 R09: ffff880118d3fe70
May  2 14:10:05 rmc-cs31 kernel: R10: 0000000000000042 R11: 0000000000000293 R12: ffff880621f8df50
May  2 14:10:05 rmc-cs31 kernel: R13: ffff8807b3f998d0 R14: 0000000000000002 R15: ffff880118d3fdd8
May  2 14:10:05 rmc-cs31 kernel: FS:  00007f51897677c0(0000) GS:ffff880645540000(0000) knlGS:0000000000000000
May  2 14:10:05 rmc-cs31 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May  2 14:10:05 rmc-cs31 kernel: CR2: 00007f516f12c894 CR3: 0000000623764000 CR4: 00000000000007e0
May  2 14:10:05 rmc-cs31 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May  2 14:10:05 rmc-cs31 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May  2 14:10:05 rmc-cs31 kernel: Process smbd-notifyd (pid: 26402, threadinfo ffff880118d3c000, task ffff8806237c8ab0)
May  2 14:10:05 rmc-cs31 kernel: Stack:
May  2 14:10:05 rmc-cs31 kernel: ffff880118d3fe48 ffffffff815047a0 ffff880a60e4c480 ffff880c14960100
May  2 14:10:05 rmc-cs31 kernel: <d> ffff880c0b9a5f30 ffff880c0b9a5c80 ffff880118d3fea8 ffffffff815073cd
May  2 14:10:05 rmc-cs31 kernel: <d> ffff880c14960100 ffffffff81b280e0 ffffffff81672c80 000000000000001c
May  2 14:10:05 rmc-cs31 kernel: Call Trace:
May  2 14:10:05 rmc-cs31 kernel: [<ffffffff815047a0>] ? unix_state_double_lock+0x60/0x70
May  2 14:10:05 rmc-cs31 kernel: [<ffffffff815073cd>] ? unix_dgram_connect+0x9d/0x2d0
May  2 14:10:05 rmc-cs31 kernel: [<ffffffff814572a7>] ? sys_connect+0xd7/0xf0
May  2 14:10:05 rmc-cs31 kernel: [<ffffffff810e884e>] ? __audit_syscall_exit+0x25e/0x290
May  2 14:10:05 rmc-cs31 kernel: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
May  2 14:10:05 rmc-cs31 kernel: Code: 01 74 05 e8 12 19 d6 ff c9 c3 55 48 89 e5 0f 1f 44 00 00 b8 00 00 01 00 f0 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 <eb> f5 
83 3f 00 75 f4 eb df c9 c3 0f 1f 40 00 55 48 89 e5 0f 1f 
May  2 14:10:05 rmc-cs31 kernel: Call Trace:
May  2 14:10:05 rmc-cs31 kernel: [<ffffffff815047a0>] ? unix_state_double_lock+0x60/0x70
May  2 14:10:05 rmc-cs31 kernel: [<ffffffff815073cd>] ? unix_dgram_connect+0x9d/0x2d0
May  2 14:10:05 rmc-cs31 kernel: [<ffffffff814572a7>] ? sys_connect+0xd7/0xf0
May  2 14:10:05 rmc-cs31 kernel: [<ffffffff810e884e>] ? __audit_syscall_exit+0x25e/0x290
May  2 14:10:05 rmc-cs31 kernel: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
May  2 14:10:06 rmc-cs31 abrt-dump-oops: Reported 1 kernel oopses to Abrt
May  2 14:11:06 rmc-cs31 kernel: INFO: task jbd2/dm-1-8:2383 blocked for more than 120 seconds.
May  2 14:11:06 rmc-cs31 kernel:      Tainted: G           --L------------    2.6.32-573.18.1.el6.x86_64 #1
May  2 14:11:06 rmc-cs31 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  2 14:11:06 rmc-cs31 kernel: jbd2/dm-1-8   D 0000000000000008     0  2383      2 0x00000000
May  2 14:11:06 rmc-cs31 kernel: ffff880c22637a80 0000000000000046 ffff880c22e7c2c0 ffff880c220dff00
May  2 14:11:06 rmc-cs31 kernel: ffff880c22637a40 ffffffffa0004d9f ffff880c22637a30 ffffffff810ad40f
May  2 14:11:06 rmc-cs31 kernel: ffff880c2122aea8 0000000000000000 ffff880c22343ad8 ffff880c22637fd8
May  2 14:11:06 rmc-cs31 kernel: Call Trace:
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa0004d9f>] ? dm_table_unplug_all+0x5f/0x100 [dm_mod]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff810ad40f>] ? ktime_get_ts+0xbf/0x100
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81127540>] ? sync_page+0x0/0x50
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81539673>] io_schedule+0x73/0xc0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff8112757d>] sync_page+0x3d/0x50
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81539f0a>] __wait_on_bit_lock+0x5a/0xc0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81127517>] __lock_page+0x67/0x70
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff810a14e0>] ? wake_bit_function+0x0/0x50
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff8113d8d5>] ? pagevec_lookup_tag+0x25/0x40
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff8113c81d>] write_cache_pages+0x3cd/0x4c0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81014a19>] ? read_tsc+0x9/0x10
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff8113b370>] ? __writepage+0x0/0x40
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81290809>] ? cpumask_next_and+0x29/0x50
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff8105ee94>] ? find_busiest_group+0x244/0x9f0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff8113c934>] generic_writepages+0x24/0x30
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa009d4d7>] journal_submit_inode_data_buffers+0x47/0x50 [jbd2]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa009d9ed>] jbd2_journal_commit_transaction+0x37d/0x14f0 [jbd2]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81089acc>] ? lock_timer_base+0x3c/0x70
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff8108a6cb>] ? try_to_del_timer_sync+0x7b/0xe0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa00a3a38>] kjournald2+0xb8/0x220 [jbd2]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa00a3980>] ? kjournald2+0x0/0x220 [jbd2]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff810a0fce>] kthread+0x9e/0xc0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
May  2 14:11:06 rmc-cs31 kernel: INFO: task nfsd:21653 blocked for more than 120 seconds.
May  2 14:11:06 rmc-cs31 kernel:      Tainted: G           --L------------    2.6.32-573.18.1.el6.x86_64 #1
May  2 14:11:06 rmc-cs31 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May  2 14:11:06 rmc-cs31 kernel: nfsd          D 0000000000000001     0 21653      2 0x00000080
May  2 14:11:06 rmc-cs31 kernel: ffff880c18997500 0000000000000046 ffff880c189974c8 ffff880c189974c4
May  2 14:11:06 rmc-cs31 kernel: ffff880c22e7c2c0 ffff880c2fc28600 0016cf5bc1c58cd6 ffff8806455559c0
May  2 14:11:06 rmc-cs31 kernel: 00000000000005fc 000000027e679031 ffff880c22f0dad8 ffff880c18997fd8
May  2 14:11:06 rmc-cs31 kernel: Call Trace:
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff811c91e0>] ? sync_buffer+0x0/0x50
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81539673>] io_schedule+0x73/0xc0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff811c9220>] sync_buffer+0x40/0x50
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81539f0a>] __wait_on_bit_lock+0x5a/0xc0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff811c91e0>] ? sync_buffer+0x0/0x50
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81539fe8>] out_of_line_wait_on_bit_lock+0x78/0x90
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff810a14e0>] ? wake_bit_function+0x0/0x50
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff811c8779>] ? __find_get_block+0xa9/0x200
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff811c93c6>] __lock_buffer+0x36/0x40
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa009d28b>] do_get_write_access+0x48b/0x520 [jbd2]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa009d471>] jbd2_journal_get_write_access+0x31/0x50 [jbd2]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa00eae58>] __ext4_journal_get_write_access+0x38/0x80 [ext4]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa00c4bb3>] ext4_reserve_inode_write+0x73/0xa0 [ext4]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa00c4c2c>] ext4_mark_inode_dirty+0x4c/0x1d0 [ext4]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa009c3d5>] ? jbd2_journal_start+0xb5/0x100 [jbd2]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa00c4f20>] ext4_dirty_inode+0x40/0x60 [ext4]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff811be91b>] __mark_inode_dirty+0x3b/0x160
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff811af002>] file_update_time+0xf2/0x170
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff814c2f49>] ? tcp_send_ack+0xd9/0x120
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81129910>] __generic_file_aio_write+0x230/0x490
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff814ac384>] ? ip_finish_output+0x184/0x360
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81129bf8>] generic_file_aio_write+0x88/0x100
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa00bee08>] ext4_file_write+0x58/0x190 [ext4]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa00bedb0>] ? ext4_file_write+0x0/0x190 [ext4]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff811919cb>] do_sync_readv_writev+0xfb/0x140
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa02585e8>] ? find_acceptable_alias+0x28/0x100 [exportfs]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81231c26>] ? security_file_permission+0x16/0x20
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81192a76>] do_readv_writev+0xd6/0x1f0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa03698e2>] ? nfsd_setuser_and_check_port+0x62/0xb0 [nfsd]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff81192bd6>] vfs_writev+0x46/0x60
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa036b235>] nfsd_vfs_write+0x105/0x430 [nfsd]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff8118f0a2>] ? dentry_open+0x52/0xc0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa036cdcb>] ? nfsd_open+0x1db/0x240 [nfsd]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa036d1f7>] nfsd_write+0xe7/0x100 [nfsd]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa0375c0f>] nfsd3_proc_write+0xaf/0x140 [nfsd]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa0366405>] nfsd_dispatch+0xe5/0x230 [nfsd]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa02f3c84>] svc_process_common+0x344/0x640 [sunrpc]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa02f42c0>] svc_process+0x110/0x160 [sunrpc]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa0366b32>] nfsd+0xc2/0x160 [nfsd]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffffa0366a70>] ? nfsd+0x0/0x160 [nfsd]
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff810a0fce>] kthread+0x9e/0xc0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff8100c28a>] child_rip+0xa/0x20
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
May  2 14:11:06 rmc-cs31 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20



----------------------------
Unser System ist mit einem Mailverschluesselungs-Gateway ausgestattet. Wenn Sie moechten, dass an Sie gerichtete E-Mails verschluesselt werden, senden Sie einfach eine S/MIME-signierte E-Mail oder Ihren PGP Public Key an hansjoerg.maurer at itsd.de.

Our system is equipped with an email encryption gateway. If you want email sent to you to be encrypted please send a S/MIME signed email or your PGP public key to hansjoerg.maurer at itsd.de.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 6948 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20160502/3a34bb3d/smime.bin>


More information about the samba-technical mailing list