[linux-cifs-client] Problems with RedHat AS 4, Netware, and CIFS
Sandi Piazza
helpme at CLEMSON.EDU
Wed Nov 30 21:34:45 GMT 2005
Hi all ... I'm at my wits end over this so I'm hoping that one or more of
you may be able to put me out of my misery. Sorry this is so long but I
wanted to make sure there was enough detail to not have to go back and forth!
Background:
I have a Dell PowerEdge 2650 running RedHat AS 4 (kernel 2.6.9-22.EL).
The version of Samba installed is 3.0.10-1.4E.2.
It's job is to be a web server (Apache 2.0.52-19.ent).
On it I need to mount 19 Novell Netware volumes which are attached to 11
servers. The actual web pages are stored on and served from these Netware
volumes. The mounts are done from a script executed in rc.local. We can't
put them in /etc/fstab because the Legato backup software we use walks
through /etc/fstab when backing up the linux servers and having the Netware
servers in there causes problems. The Netware servers are backed up separately.
The Novell servers are all running Netware 6.5sp4.
A mount.cifs -V gives me "mount.cifs version: 1.5"
According to the folks who maintain the Novell servers the CIFS NLM version
is either 3.22 (16 volumes on 8 servers which are not clustered) or 3.23.04
(3 volumes on 3 servers which are clustered).
Up front I'll tell you that we do NOT want to manually install stuff unless
there is absolutely no other way. We prefer to stick with the
RedHat-provided rpms to facilitate updates and consistency across systems
(I have about 40 servers at the moment with the numbers growing and this is
the first one which mounts Netware volumes that we're trying to upgrade to
RHEL 4). It is also the case that I have another server running RedHat ES
2.1 (kernel -2.4.9-e.59, samba 2.2.12-1.21as.4, Apache 1.3.27) with these
same servers mounted with smbfs which runs with no problems unless there
are major problems with the Netware servers.
Problem #1: When I use the following command to mount the volumes using cifs:
mount -t cifs -o
credentials=/etc/smbcreds,uid=nobody,file_mode=0755,dir_mode=0755
//student-a.clemson.edu/USR01 /cifsmounts/STUDENT-A/USR01
the server will run for a short period of time (anywhere from 15 minutes up
to about 2 hours) and /var/log/messages will start filling up with LOTS of:
Nov 28 18:28:21 rh7 kernel: CIFS VFS: Send error in read = -13
errors along with some:
Nov 28 23:12:37 rh7 kernel: CIFS VFS: Error 0xfffffff3 or on
cifs_get_inode_info in lookup
messages. The load average on the server starts spinning up, the mounted
volumes stop responding, and eventually the whole server stops responding.
Then almost every time I get a kernel panic. With or without the kernel
panic, it's a "hard freeze" in the sense that I have to push the power
button to unlock it and reboot. The last kernel panic had the following in
/var/log/messages:
kernel panic:
kernel: proc_file_read: Apparent buffer overflow!
kernel: CIFS VFS: Send error in read = -13
kernel: security_compute_av: unrecognized class 10550
kernel: Unable to handle kernel paging request at virtual address 6f436874
Problem #2: When I use the following command to mount the volumes using
smbfs (to the same Netware volumes):
mount -t smbfs -o credentials=/etc/smbcreds,uid=nobody
//student-a.clemson.edu/USR01 /cifsmounts/STUDENT-A/USR01
It seems to run OK until one or more of the Netware volumes decides not to
talk. This could be because someone took it off-line for an upgrade or
because it is having some sort of problem, or more often for no
particularly apparent reason (in these cases when I contact the Netware
support folks and they look at their servers they see no errors and no
abnormal activity.) In the first 2 cases, neither of the volumes will
respond (a cd to it followed by an ls results in an input/output error). In
the third case the other volume on that server is still actively responding
to web requests. Having the Netware guys reload the NLM on the affected
server does not fix the problem. The following are the sort of errors that
appear in /var/log/messages:
Nov 29 21:55:46 rh7
mount.smbfs[2902]: tdb(/var/cache/samba/gencache.tdb): tdb_lock failed on
list 63 ltype=1 (Bad file descriptor)
Nov 29 21:55:46 rh7 mount.smbfs[2902]: [2005/11/29 21:55:46, 0]
tdb/tdbutil.c:tdb_log(725)
The load average spins up and when it reaches about 225 the server stops
responding. If the Netware server is not really down and I can't get to a
particular volume on the RHEL 4 box I CAN get to it on the RHEL 2.1 box! On
the RHEL 4 box I can stop the web server, umount the volume, remount it,
restart the web server, and everything goes back to normal.
So ... can anyone help? We'd really prefer to use cifs but if there is a
problem with cifs and Netware that is not fixable, we're willing to use
smbfs ... but then I need to know how to fix the problem with it not
cleaning up after itself when there is a problem with one of the Netware
servers.
Any and ALL help/suggestions will be gratefully accepted.
Thanks.
Sandi Piazza
Clemson University
Division of Computing and Information Technology
More information about the linux-cifs-client
mailing list