[linux-cifs-client] OOM kills when running fsstress on CIFS

Nick Piggin npiggin at suse.de
Tue May 25 05:16:39 MDT 2010


On Tue, May 25, 2010 at 06:57:05AM -0400, Jeff Layton wrote:
> Since 2.6.34, I've been able to consistently reproduce OOM kills when running fsstress (from the LTP suite) on CIFS. I spent some time yesterday and bisected it down to this patch:
> 
> ---------------------[snip]---------------------
> commit 315e995c63a15cb4d4efdbfd70fe2db191917f7a
> Author: Nick Piggin <npiggin at suse.de>
> Date:   Wed Apr 21 03:18:28 2010 +0000
> 
>     [CIFS] use add_to_page_cache_lru
>     
>     add_to_page_cache_lru is exported, so it should be used. Benefits over
>     using a private pagevec: neater code, 128 bytes fewer stack used, percpu
>     lru ordering is preserved, and finally don't need to flush pagevec
>     before returning so batching may be shared with other LRU insertions.
>     
>     Signed-off-by: Nick Piggin <npiggin at suse.de>
>     Reviewed-by: Dave Kleikamp <shaggy at linux.vnet.ibm.com>
>     Signed-off-by: Steve French <sfrench at us.ibm.com>
> ---------------------[snip]---------------------
> 
> Here's how I've been reproducing it:
> 
> Mount up a samba share with -o sec=krb5i,nounix,noserverino
> 
> Run: fsstress -d /path/to/dir/on/cifs/ -n 1000 -l0 -p8
> 
> ...within an hour or two, I start getting OOM kills. After backing out
> the patch above, I was able to run the test overnight. I'm not sure yet
> what the actual problem is, but there seems to be something wrong with
> that patch.
> 
> Thoughts?

Yep, it's my fault. The problem is the refcounting. Previously the
code hands off the references to the LRU, wheras now the lru takes
a new reference. (the other filesystems converted to use this
function seemed to more conventionally open-code lru_cache_add).

Can we get rid of a refcount increment anywhere? Otherwise we'll
need to just drop the references after adding the pages.



More information about the linux-cifs-client mailing list