[linux-cifs-client] high system load and processes stuck in iowait with CIFS mount

Jeff Layton jlayton at redhat.com
Tue Apr 22 22:25:42 GMT 2008


On Wed, 16 Apr 2008 10:51:30 -0500
Timothee Besset <ttimo at idsoftware.com> wrote:

> Hello,
> 
> Debian Etch x86, with stock 2.6.18 kernel, also happening with a custom
> 2.6.24.3 kernel:
> 
> I have a nice'd backup process that writes out data using a CIFS mount
> to a remote/offsite host. The connection to the remote host isn't very
> fast, and I don't mind that it takes a long time to run.
> 
> The problem is that it kills performance on everything else on the
> system when running. I see the load average going very high (~10) but
> very little actual CPU usage. Processes that are unrelated to the backup
> get stuck on iowait / disk sleep for very long periods at a time, making
> the entire system unresponsive. Even logging into the system takes up to
> a minute because processes that need to do any filesystem access will
> get stuck for a while.
> 
> I understand that an offsite CIFS mount is far from ideal for backup
> connectivity. The backup utility I'm using (duplicity) doesn't have a
> "direct to CIFS" backend and I can't get something else for offsite
> storage from my company, so I'm stuck with a CIFS filesystem mount.
> 
> Is there anything I can do to avoid the CIFS access hurting everything else?
> 

These types of problems can often be very hard to troubleshoot...

In principle, it should be that the slow CIFS mount just makes
duplicity block like this. Everything else should keep moving along.

If that's not happening then there are a few possibilities:

1) the CIFS code is holding a lock or something during these long
sleeps and that's causing everything else to serialize.

2) the box is under memory pressure and it's trying to flush CIFS pages
to free up memory. This is taking a long time because of the slow link
and things are stacking up.

3) something I haven't thought of right offhand...

Usually in this situation, I end up telling people to get the box into
this state and then fire off a sysrq-m to get info on kernel memory
and a few sysrq-t's to make all of the tasks dump their stacks to the
ring buffer. From there you can pick out something that you know is
blocked but shouldn't be and see if you can figure out what it's
waiting on.

Not trivial, so you may want to first try updating the kernel to
something more recent and seeing if the problem just goes away. 2.6.18
is pretty old by now, and the CIFS code has seen some substantial work
in the last 2 years or so.

Cheers,
-- 
Jeff Layton <jlayton at redhat.com>


More information about the linux-cifs-client mailing list