[linux-cifs-client] Problems with RedHat AS 4, Netware, and CIFS

Sandi Piazza helpme at CLEMSON.EDU
Wed Nov 30 21:34:45 GMT 2005


Hi all ... I'm at my wits end over this so I'm hoping that one or more of 
you may be able to put me out of my misery. Sorry this is so long but I 
wanted to make sure there was enough detail to not have to go back and forth!


Background:

I have a Dell PowerEdge 2650 running RedHat AS 4 (kernel 2.6.9-22.EL).

The version of Samba installed is 3.0.10-1.4E.2.

It's job is to be a web server (Apache 2.0.52-19.ent).

On it I need to mount 19 Novell Netware volumes which are attached to 11 
servers. The actual web pages are stored on and served from these Netware 
volumes. The mounts are done from a script executed in rc.local. We can't 
put them in /etc/fstab because the Legato backup software we use walks 
through /etc/fstab when backing up the linux servers and having the Netware 
servers in there causes problems. The Netware servers are backed up separately.

The Novell servers are all running Netware 6.5sp4.

A mount.cifs -V gives me "mount.cifs version: 1.5"

According to the folks who maintain the Novell servers the CIFS NLM version 
is either 3.22 (16 volumes on 8 servers which are not clustered) or 3.23.04 
(3 volumes on 3 servers which are clustered).

Up front I'll tell you that we do NOT want to manually install stuff unless 
there is absolutely no other way. We prefer to stick with the 
RedHat-provided rpms to facilitate updates and consistency across systems 
(I have about 40 servers at the moment with the numbers growing and this is 
the first one which mounts Netware volumes that we're trying to upgrade to 
RHEL 4). It is also the case that I have another server running RedHat ES 
2.1 (kernel -2.4.9-e.59, samba 2.2.12-1.21as.4, Apache 1.3.27) with these 
same servers mounted with smbfs which runs with no problems unless there 
are major problems with the Netware servers.

Problem #1: When I use the following command to mount the volumes using cifs:

mount -t cifs -o 
credentials=/etc/smbcreds,uid=nobody,file_mode=0755,dir_mode=0755 
//student-a.clemson.edu/USR01  /cifsmounts/STUDENT-A/USR01

the server will run for a short period of time (anywhere from 15 minutes up 
to about 2 hours) and /var/log/messages will start filling up with LOTS of:

Nov 28 18:28:21 rh7 kernel: CIFS VFS: Send error in read = -13

errors along with some:

Nov 28 23:12:37 rh7 kernel: CIFS VFS: Error 0xfffffff3 or on 
cifs_get_inode_info in lookup

messages. The load average on the server starts spinning up, the mounted 
volumes stop responding, and eventually the whole server stops responding. 
Then almost every time I get a kernel panic.  With or without the kernel 
panic, it's a "hard freeze" in the sense that I have to push the power 
button to unlock it and reboot. The last kernel panic had the following in 
/var/log/messages:

kernel panic:
kernel: proc_file_read: Apparent buffer overflow!
kernel:  CIFS VFS: Send error in read = -13
kernel: security_compute_av:  unrecognized class 10550
kernel: Unable to handle kernel paging request at virtual address 6f436874

Problem #2: When I use the following command to mount the volumes using 
smbfs (to the same Netware volumes):

mount -t smbfs -o credentials=/etc/smbcreds,uid=nobody 
//student-a.clemson.edu/USR01 /cifsmounts/STUDENT-A/USR01

It seems to run OK until one or more of the Netware volumes decides not to 
talk. This could be because someone took it off-line for an upgrade or 
because it is having some sort of problem, or more often for no 
particularly apparent reason (in these cases when I contact the Netware 
support folks and they look at their servers they see no errors and no 
abnormal activity.) In the first 2 cases, neither of the volumes will 
respond (a cd to it followed by an ls results in an input/output error). In 
the third case the other volume on that server is still actively responding 
to web requests. Having the Netware guys reload the NLM on the affected 
server does not fix the problem. The following are the sort of errors that 
appear in /var/log/messages:

Nov 29 21:55:46 rh7 
mount.smbfs[2902]:   tdb(/var/cache/samba/gencache.tdb): tdb_lock failed on 
list 63 ltype=1 (Bad file descriptor)
Nov 29 21:55:46 rh7 mount.smbfs[2902]: [2005/11/29 21:55:46, 0] 
tdb/tdbutil.c:tdb_log(725)

The load average spins up and when it reaches about 225 the server stops 
responding. If the Netware server is not really down and I can't get to a 
particular volume on the RHEL 4 box I CAN get to it on the RHEL 2.1 box! On 
the RHEL 4 box I can stop the web server, umount the volume, remount it, 
restart the web server, and everything goes back to normal.

So ... can anyone help? We'd really prefer to use cifs but if there is a 
problem with cifs and Netware that is not fixable, we're willing to use 
smbfs ... but then I need to know how to fix the problem with it not 
cleaning up after itself when there is a problem with one of the Netware 
servers.

Any and ALL help/suggestions will be gratefully accepted.

Thanks.

               Sandi Piazza
               Clemson University
               Division of Computing and Information Technology



More information about the linux-cifs-client mailing list