fcntl spinlock in Linux?

Wed Aug 14 08:52:43 MDT 2013

Hi.

2013/8/14 Jeff Layton <jlayton at redhat.com>

> On Tue, 13 Aug 2013 17:15:43 -0400
> Alex Korobkin <korobkin+smb at gmail.com> wrote:
>
> > Hi all,
> >
> > I found this discussion here, while troubleshooting an issue with kernel
> > getting stuck in spin_lock() when a Samba 3.6-based printserver serves
> > multiple Windows clients.
> > https://lists.samba.org/archive/samba-technical/2013-January/090122.html
> >
> > The issue is hard to reproduce. All I can see is random printservers
> > crashing once in several days, with kernel (v3.2.5) being stuck in the
> same
> > spin_lock function.
> >
>
> Hmmm...getting stuck on a spinlock is not generally something that
> causes a crash. Are they actually crashing or just getting hung on that
> lock? Do you know what spinlock it is? Have a stack trace maybe?
>
>
Yes, sorry for bad wording. The machine was hung, not crashed.

Here is a trace from SysRq L:

[363278.604569] Call Trace:
[363278.604578]  [<ffffffff8117e570>] lock_flocks+0x10/0x20
[363278.604584]  [<ffffffff8117fbc1>] __posix_lock_file+0x41/0x5c0
[363278.604590]  [<ffffffff8118033b>] vfs_lock_file+0x3b/0x40
[363278.604596]  [<ffffffff8118064f>] fcntl_setlk+0x16f/0x320
[363278.604603]  [<ffffffff811493b7>] sys_fcntl+0x167/0x5c0
[363278.604609]  [<ffffffff8169e112>] system_call_fastpath+0x16/0x1b
[363278.604613] Code: c3 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5
f0 66 0f c1 07 89 c2 66 c1 ea 08 38 c2 74 11 0f 1f 84 00 00 00 00 00 f3 90
<0f> b6 07 38 c2 75 f7 c9 c3 0f 1f 44 00 00 55 48 89 e5 ff 14 25

Machine code quoted there can seemingly be translated back into

void lock_flocks(void)
{
       spin_lock(&file_lock_lock);
}

in the kernel.

> The discussion suggests this patch to try with the kernel:
> > https://lists.samba.org/archive/samba-technical/2013-January/090224.html
> >
> > I'm not very confident about patching the kernel, and curious if there is
> > anything I could try to mitigate it on Samba's side. What would you
> > recommend?
> >
>
> The 3.11 kernel will be getting a first round of patches that breaks up
> the global file_lock_lock spinlock into a per-inode lock for the most
> part, and makes some other scalability improvements. Without knowing
> what specific problem you're having I can't really say whether those
> changes will help you however.
>
> I'm also working on a set of patches to help address the thundering
> herd problem when a lock is released. That was the main problem that
> Volker saw. I have a scheme to address that too and a set of patches,
> but it's 3.12 material at best (and probably more like 3.13).
>
> --
> Jeff Layton <jlayton at redhat.com>
>

I'm attaching a per-process stack trace as well for you to have a look.
Both CPUs seem to be stalled by smbd processes, please notice this line in
the logs:
[363328.100002] INFO: rcu_sched detected stall on CPU 1 (t=1232040 jiffies)
[363328.100002] Pid: 13879, comm: smbd Not tainted 3.2.5-xen #1

I noticed that 3.6.18 was released today with
https://bugzilla.samba.org/show_bug.cgi?id=10064 fixed. I'm going to try it
out and see if it's related at all to this issue.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stack_trace_smbd.txt.gz
Type: application/x-gzip
Size: 39061 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20130814/b68f7eca/attachment.bin>