[linux-cifs-client] when a mounted remote smb server goes down - cifs vfs tries to re-connect - with strange side-effects

Jeff Layton jlayton at samba.org
Fri Jan 8 05:16:16 MST 2010


On Fri, 8 Jan 2010 05:01:49 +0100
Günter Kukkukk <linux at kukkukk.com> wrote:

> Am Mittwoch 06 Januar 2010 13:12:23 schrieb Jeff Layton:
> > On Wed, 6 Jan 2010 03:16:25 +0100
> > 
> > Günter Kukkukk <linux at kukkukk.com> wrote:
> > > Am Dienstag 05 Januar 2010 13:18:04 schrieb Jeff Layton:
> > > > On Tue, 5 Jan 2010 05:24:43 +0100
> > > >
> > > > Günter Kukkukk <linux at kukkukk.com> wrote:
> > > > > Hi Jeff, Steve, ...
> > > > >
> > > > > I'm running here on opensuse 11.2 - tried default kernel - and now
> > > > > 2.6.33-rc2-0.1-default from latest cifs git.
> > > > >
> > > > > When mounting a remote smb server - and then that server goes down
> > > > > (just shutdown, no network cable unplugged), my client KDE4.x desktop
> > > > > becomes _very_ unresponsive - the (kicker) taskbar is now _unusable_
> > > > > - no response at all - which also means that a user can't even use
> > > > > the GUI to shutdown.
> > > > >
> > > > > As soon i restart the remote smb server, all is fine again.
> > > > >
> > > > > Used many tools (like those from sysstat) - but at least CPU usage
> > > > > seem to be  _not_ the problem.
> > > > >
> > > > > Could it be some kernel semaphores....?
> > > > >
> > > > > When the remote server is down, wireshark only shows some 3 seconds
> > > > > tcp/ip traffic on ports 139 and 445.
> > > > >
> > > > > From the cifs (debug 7) kernel log:
> > > > >
> > > > > Jan  5 04:38:01 linux300 kernel: [46253.309431]  fs/cifs/inode.c:
> > > > > Getting info on //linux700/homegk Jan  5 04:38:01 linux300 kernel:
> > > > > [46253.309438] fs/cifs/cifssmb.c: In QPathInfo (Unix) the path
> > > > > //linux700/homegk Jan  5 04:38:04 linux300 kernel: [46256.080085] 
> > > > > fs/cifs/connect.c: Socket created Jan  5 04:38:04 linux300 kernel:
> > > > > [46256.080740]
> > > > > fs/cifs/connect.c: Error -111 connecting to server via ipv4 Jan  5
> > > > > 04:38:04 linux300 kernel: [46256.080770]  fs/cifs/connect.c:
> > > > > reconnect error -111 Jan  5 04:38:07 linux300 kernel: [46259.084106]
> > > > > fs/cifs/connect.c: Socket created Jan  5 04:38:07 linux300 kernel:
> > > > > [46259.084773]  fs/cifs/connect.c: Error -111 connecting to server
> > > > > via ipv4 Jan  5 04:38:07 linux300 kernel: [46259.084804] 
> > > > > fs/cifs/connect.c: reconnect error -111 Jan  5 04:38:10 linux300
> > > > > kernel: [46262.088085] fs/cifs/connect.c: Socket created Jan  5
> > > > > 04:38:10 linux300 kernel: [46262.088728]  fs/cifs/connect.c: Error
> > > > > -111 connecting to server via ipv4 Jan  5 04:38:10 linux300 kernel:
> > > > > [46262.088758]  fs/cifs/connect.c: reconnect error -111 Jan  5
> > > > > 04:38:11 linux300 kernel: [46263.308062] fs/cifs/cifssmb.c: gave up
> > > > > waiting on reconnect in smb_init Jan  5 04:38:11 linux300 kernel:
> > > > > [46263.308082]  fs/cifs/inode.c: error on getting revalidate info
> > > > > -112 Jan  5 04:38:11 linux300 kernel: [46263.308090] 
> > > > > fs/cifs/inode.c: cifs_revalidate - inode unchanged Jan  5 04:38:11
> > > > > linux300 kernel: [46263.308102]  fs/cifs/inode.c: CIFS VFS: leaving
> > > > > cifs_revalidate (xid = 224) rc = -112
> > > > > Jan  5 04:38:11 linux300 kernel: [46263.310204]  fs/cifs/inode.c:
> > > > > CIFS VFS: in cifs_revalidate as Xid: 225 with uid: 1000
> > > > > Jan  5 04:38:11 linux300 kernel: [46263.310226]  fs/cifs/inode.c:
> > > > > Revalidate: //linux700/homegk inode 0xf41c3e84 count 1 dentry:
> > > > > 0xdaa7325c d_time 1951477551 jiffies 11490827
> > > > > Jan  5 04:38:11 linux300 kernel: [46263.310236]  fs/cifs/inode.c:
> > > > > Getting info on //linux700/homegk Jan  5 04:38:11 linux300 kernel:
> > > > > [46263.310243] fs/cifs/cifssmb.c: In QPathInfo (Unix) the path
> > > > > //linux700/homegk Jan  5 04:38:13 linux300 kernel: [46265.092103] 
> > > > > fs/cifs/connect.c: Socket created Jan  5 04:38:13 linux300 kernel:
> > > > > [46265.092744]
> > > > > fs/cifs/connect.c: Error -111 connecting to server via ipv4 Jan  5
> > > > > 04:38:13 linux300 kernel: [46265.092774]  fs/cifs/connect.c:
> > > > > reconnect error -111 Jan  5 04:38:16 linux300 kernel: [46268.096101]
> > > > > fs/cifs/connect.c: Socket created Jan  5 04:38:16 linux300 kernel:
> > > > > [46268.096751]  fs/cifs/connect.c: Error -111 connecting to server
> > > > > via ipv4 Jan  5 04:38:16 linux300 kernel: [46268.096781] 
> > > > > fs/cifs/connect.c: reconnect error -111 Jan  5 04:38:19 linux300
> > > > > kernel: [46271.100088] fs/cifs/connect.c: Socket created Jan  5
> > > > > 04:38:19 linux300 kernel: [46271.100729]  fs/cifs/connect.c: Error
> > > > > -111 connecting to server via ipv4 Jan  5 04:38:19 linux300 kernel:
> > > > > [46271.100758]  fs/cifs/connect.c: reconnect error -111 Jan  5
> > > > > 04:38:21 linux300 kernel: [46273.308054] fs/cifs/cifssmb.c: gave up
> > > > > waiting on reconnect in smb_init Jan  5 04:38:21 linux300 kernel:
> > > > > [46273.308075]  fs/cifs/inode.c: error on getting revalidate info
> > > > > -112 Jan  5 04:38:21 linux300 kernel: [46273.308083] 
> > > > > fs/cifs/inode.c: cifs_revalidate - inode unchanged Jan  5 04:38:21
> > > > > linux300 kernel: [46273.308097]  fs/cifs/inode.c: CIFS VFS: leaving
> > > > > cifs_revalidate (xid = 225) rc = -112
> > > > > Jan  5 04:38:21 linux300 kernel: [46273.311877]  fs/cifs/inode.c:
> > > > > CIFS VFS: in cifs_revalidate as Xid: 226 with uid: 1000
> > > > > Jan  5 04:38:21 linux300 kernel: [46273.311901]  fs/cifs/inode.c:
> > > > > Revalidate: //linux700/homegk inode 0xf41c3e84 count 1 dentry:
> > > > > 0xdaa7325c d_time 1951477551 jiffies 11493327
> > > > > Jan  5 04:38:21 linux300 kernel: [46273.311912]  fs/cifs/inode.c:
> > > > > Getting info on //linux700/homegk
> > > >
> > > > Most likely the problem is that you have programs that are trying to
> > > > repeatedly access the CIFS share while the server is down. The kernel
> > > > looks like it's doing what it's supposed to do, which is to attempt to
> > > > reconnect to the server so it can satisfy the syscall.
> > > >
> > > > It looks like the program is just retrying the syscall over and over
> > > > here, but you might want to try and verify that by stracing the program
> > > > (if you can track it down). If so, then there's not much we can do at
> > > > the kernel level. You might be able to get away with doing a lazy
> > > > umount (umount -l) to work around it if the program is retrying the
> > > > syscalls.
> > >
> > > Hi Jeff,
> > > i was absolutely sure that no other program was accessing that mount.
> > > Opened a root console and did that mount, then stopped smbd on
> > > the remote server. The kde4 taskbar problem was immediately seen.
> > >
> > > So i was hunting for a while - and you won't believe.
> > > For some testing purpose i had added the following line to /etc/fstab
> > > some weeks ago (and forgot about it):
> > >
> > > //linux700/homegk    /mnt/linux           cifs      
> > > cred=/root/creds/gk.creds,rw,noauto 0 0
> > >
> > > As soon as i remove that line and save fstab, all behaves correctly
> > > again. I can do so back and forth interactively with emacs - and the
> > > change is immediately (!) caught by some "watcher" .... atm no idea who
> > > it is.  :-)
> > 
> > Interesting. One thing that might be good to do is to change the
> > cFYI/cERROR macros to printk the PID of the process as well
> > (current->tgid). That might help to track down what's doing this.
> > 
> 
> Hi Jeff,
> 
> printing  (current->tgid) with 
>    cFYI(1, ("Getting info on %s <%d>", full_path, current->tgid));
> was a really (!) interesting idea!  :-)
> 
>         Getting info on \\linux700\homegk <2672>
> 
> ps aux | grep 2672
> gk        2672  1.0  4.4 277036 45424 ?        Sl   04:03   0:17 kdeinit4: plasma-desktop [kdeinit]
> 
> The strange thing is, that this kde4 thingy is requesting the cifs availability in 10 seconds 
> (sometimes 20 seconds) intervals - but at least the kde4 taskbar is _always_ frozen.
> 

Glad that helped. We should probably consider making the cFYI macro
print current->tgid on every message. That might be useful info...

I'm afraid I don't know much about KDE. It's possible that the
taskbar is blocked and waiting for the plasma-desktop thing to do
something, and it can't because it can't stat the mountpoint.

In any case, this really sounds like a KDE problem. I wouldn't expect an
unresponsive mountpoint to cause it to hang this way. You'd have a
similar issue I think if you had a NFS mountpoint and the server went
away.

-- 
Jeff Layton <jlayton at samba.org>


More information about the linux-cifs-client mailing list