vfs_acl_xattr and Linux memory fragmentation

Uri Simchoni uri at samba.org
Fri Apr 7 08:18:20 UTC 2017


I thought I'd share this, although I'm not sure we need to do anything
about it in upstream Samba.

We've come across incidents where we see on the console of our NAS
devices something like this:

smbd: page allocation failure: order:4, mode:0xc0d0
Pid: 6443, comm: smbd Tainted: P         C O 3.2.54 #1
Call Trace:
 [<ffffffff810b6345>] warn_alloc_failed+0xf5/0x140
 [<ffffffff810b7822>] ? drain_pages+0x32/0x90
 [<ffffffff810b78d0>] ? page_alloc_cpu_notify+0x50/0x50
 [<ffffffff810b78e1>] ? drain_local_pages+0x11/0x20
 [<ffffffff810b70c1>] __alloc_pages_nodemask+0x481/0x6d0
 [<ffffffff810e707f>] alloc_pages_current+0x7f/0xe0
 [<ffffffff810b5a99>] __get_free_pages+0x9/0x40
 [<ffffffff810ee3a8>] __kmalloc+0xe8/0xf0
 [<ffffffff811154e8>] getxattr+0x98/0x130
 [<ffffffff8110371d>] ? do_path_lookup+0x2d/0xc0
 [<ffffffff810eeb15>] ? kmem_cache_free+0x15/0x90
 [<ffffffff810ff92e>] ? putname+0x2e/0x40
 [<ffffffff8110453f>] ? user_path_at_empty+0x5f/0xa0
 [<ffffffff810a0ec0>] ? call_rcu_sched+0x10/0x20
 [<ffffffff8106d25a>] ? __put_cred+0x3a/0x50
 [<ffffffff81115654>] sys_getxattr+0x54/0x80
 [<ffffffff8105db54>] ? sys_setresgid+0x84/0x120
 [<ffffffff8169ced2>] system_call_fastpath+0x16/0x1b

...and a bunch of memory stats.

Our analysis is that:

1. Samba's vfs_acl_xattr() algorithm for getting an extended attribute
is "try with a 1K buffer and if that doesn't work try with 64K". This is
presumably an optimistic strategy to save system calls.

2. In older Linux (prior to 3.4), getxattr() used to require a
physically-contiguous (kmalloc'd) buffer whose size equals the max size
requested by the user. So even though no Linux in-tree file system
supports EA's larger than 4K, the getxattr() system call would try to
allocate 64K if Samba passes this number.

3. On busy servers, memory gets fragmented over time, and that's the
failure we've been seeing (notice the "order:4" - that means it tried
2^4 pages or 64K). That causes the getxattr to fail with ENOMEM.

4. In newer kernels (since commit 44c82498), it still tries the kmalloc,
but falls back to vmalloc. I'm not sure why kmalloc first, maybe that's
also some optimistic stragegy...

5. So in newer kernels, the 64K still incurs some extra-allocation
overhead, but at least it doesn't fail.

6. Since the initial kmalloc is done with GFP_KERNEL, I think there's a
middle case where the kernel would try to evict pages, and that might
cause a performance hit, all for memory that's not really required by
any in-tree file system (only ZFS supports 64K on Linux AFAIK).

Bottom line - Samba works well with recent Linux kernels, but we might
be taking a performance hit by asking for memory we probably don't need.
We might want to do the initial getxattr with 4K, or have a 4K step
between the 1K and the 64K.



