[Samba] Hung XFS filesystems on Samba server

Weber, Charles (NIH/NIA/IRP) [C] WeberC at grc.nia.nih.gov
Tue Sep 19 16:17:35 GMT 2006

This is probably a hardware problem but I am posting here in case anyone
else has seen it or it is actually software.
If you have seen anything like it please let me know.

For the last 1.5 years I have had occasional problems on a large (6.8
TB) Samba server. Two of the mounted filesystems will partially dismount
at intervals between 3 days and 3 months. Files will still be open but
any local access to the filesystem  such as "ls" will hang. The
particualr share is no longer accessable through Samba. I end up having
to do a hard shutdown as rebooting will also hang trying to close the
I have found no logged errors. I have 3 HP DL585 with multiple 6404 raid
controllers. Two run samba and the other is NFS only. This only occurs
on one server but it is unfortunately the busiest one. I have replaced
cables and 6404 cards. The filesystems have been checked using
xfs_repair. HP diagnostics has been run for hours. One of our other
DL585 servers is physically very close to the problem server but runs
NFS instead of Samba on XFS filesystems. It has not had this problem.
The only significant hardware difference between the NFS server and
Samba server is that the NFS server has all U320 hard drives.
Physical config:
HP DL 585 with dual processor and 3 6404 4 channel SCSI raid
controllers. 6 U320 converted 4200 drive chassis with 72 GB U3/U320 and
146 GB U320. 8 GB ram. Firmware for all parts including disks has been
flashed repeatedly over the last two years to current levels. Firmware
changes have not made any noticeable difference in this problem. I do
wonder about the mix of U3 and U320 drives but each disk carrier is
either U3 or U320. Each diskcarrier is set as one ADG array and logical
drive. It is then partitioned and formatted such as /dev/ddiss/c2d0p1
with XFS and mounted.
I started with Fedora Core2 X86_64 and have worked my way to Fedora Core
5 and samba 3.0.22-1.fc5, acl 2.2.34 and xfsprogs 2.7.3-1.2.1. No
software changes have made any difference that I can see in this
problem. Samba shares support ACLs.  
Hardware possiblities:
This has occurred in the same 2 disk carriers. I could change the disk
carriers or U320 modules. I worry also about the mix of U320 and U3
disks. I setup a test server dl385 with a 6404 from the problem server
and a disk carrier with mix of drives. I could not recreate the problem.
Software possiblities:
Kernel, Samba, ACLs and XFS. But I have tried many versions and not seen
any logged errors or change in behavior.

