[linux-cifs-client] OOM kills when running fsstress on CIFS
Nick Piggin
npiggin at suse.de
Tue May 25 05:16:39 MDT 2010
On Tue, May 25, 2010 at 06:57:05AM -0400, Jeff Layton wrote:
> Since 2.6.34, I've been able to consistently reproduce OOM kills when running fsstress (from the LTP suite) on CIFS. I spent some time yesterday and bisected it down to this patch:
>
> ---------------------[snip]---------------------
> commit 315e995c63a15cb4d4efdbfd70fe2db191917f7a
> Author: Nick Piggin <npiggin at suse.de>
> Date: Wed Apr 21 03:18:28 2010 +0000
>
> [CIFS] use add_to_page_cache_lru
>
> add_to_page_cache_lru is exported, so it should be used. Benefits over
> using a private pagevec: neater code, 128 bytes fewer stack used, percpu
> lru ordering is preserved, and finally don't need to flush pagevec
> before returning so batching may be shared with other LRU insertions.
>
> Signed-off-by: Nick Piggin <npiggin at suse.de>
> Reviewed-by: Dave Kleikamp <shaggy at linux.vnet.ibm.com>
> Signed-off-by: Steve French <sfrench at us.ibm.com>
> ---------------------[snip]---------------------
>
> Here's how I've been reproducing it:
>
> Mount up a samba share with -o sec=krb5i,nounix,noserverino
>
> Run: fsstress -d /path/to/dir/on/cifs/ -n 1000 -l0 -p8
>
> ...within an hour or two, I start getting OOM kills. After backing out
> the patch above, I was able to run the test overnight. I'm not sure yet
> what the actual problem is, but there seems to be something wrong with
> that patch.
>
> Thoughts?
Yep, it's my fault. The problem is the refcounting. Previously the
code hands off the references to the LRU, wheras now the lru takes
a new reference. (the other filesystems converted to use this
function seemed to more conventionally open-code lru_cache_add).
Can we get rid of a refcount increment anywhere? Otherwise we'll
need to just drop the references after adding the pages.
More information about the linux-cifs-client
mailing list