High-availability cluster and samba

Fri Feb 4 18:07:48 GMT 2000

Hi,

As asked to me by Jean-François Micouleau, here follows the description of the
high-availability cluster we showed at the french Linux Expo the last few days.
It may be interesting for you as the results with samba were great.

The cluster was a 2 nude cluster with shared storage, based on the Linux-HA solution
(http://www.henge.com/~alanr/ha/). As a HP partner we used the HP harware cluster solution
used for Microsoft Clusters  because it has already been certified to work well
in cluster conditions. The configuration is made of two Netserver LPR each with a NetRaid Scsi controler attached to a shared RS/12 storage bay. We didn't have all the hardware but in theory, we should have added a NIC in each server to create a private network for heartbeat, as well as a serial link between the two in order to have a redundant heartbeat. 
All the magic behind the shared storage is the brand new ext3 journalized filesystem which allow to have very short recovery times. For example, our samba service shared a 22 Gigabytes filesystem...which means about 15 minutes of ext2 fsck...and a few seconds with ext3 !

Th principle is one partition per service on the shared disks, mounted as a
Linux-HA ressource. As the first node, which has the filesystem mounted, fails,
the second one takes over IP, service...and the filesystem. The takeover is
about 15 seconds (10 due to heartbeat) with a 22 Gigabytes filesystem.
We also tried ReiserFs but it didn't allow to use the same filesystem on two
nodes.

With samba as a HA-Ressource, we are able to power off the master node while
writing data from several windows client on the shared disks. The clients
timeout and complain as they are not more able to write their data on the
server. But 15 seconds later, you can see that the server is back and that
all-but-the-last-file-you-tried-to-copy are on the server, ready to be read
again !

The most interesting thing is that there is (nearly, see after) no special configuration required
for any clustered service. On the demonstration cluster we had one node running
apache and ipop-3d and the second one samba. The only modifications brought to
smb.conf were : WORKGROUP=Medasys and the creation of a publicly available
share on /samba !
In fact, Linux-HA 'justs' starts and stops services as needed, using nearly
standards scripts. For example, the script used to start our clustered-samba
was taken from the Redhat /etc/rc;d/init.d/smb taken from samba package :

#!/bin/sh
#
# Quick copy & hack of the standard /etc/rc.d/init.d/smb in order to be used
# with linux HA
# Made by Frederic Dubuy <frederic.dubuy at medasys-digital-systems.fr> for French
# Linux expo, february  1-3 - 2000

# Source function library.
. /etc/rc.d/init.d/functions

# Source networking configuration.
. /etc/sysconfig/network

# Check that networking is up.
[ ${NETWORKING} = "no" ] && exit 0

# Check that smb.conf exists.
[ -f /etc/smb.conf ] || exit 0

RETVAL=0

SAMBA_ROOT=/cluster3

# See how we were called.
case "$1" in
  start)
	echo -n "Starting CLUSTERED  SMB services: "
	/etc/ha.d/resource.d/IPaddr 192.168.2.23 $1
	mount -o ro $SAMBA_ROOT
	umount $SAMBA_ROOT
	mount $SAMBA_ROOT
	daemon smbd -D 	
	RETVAL=$?
	echo
	echo -n "Starting CLUSTERED NMB services: "
	daemon nmbd -D 
	RETVAL2=$?
	echo
	[ $RETVAL -eq 0 -a $RETVAL2 -eq 0 ] && touch /var/lock/subsys/smb || \
	   RETVAL=1
	;;
  stop)
	echo -n "Shutting down CLUSTERED SMB services: "
	killproc smbd
	RETVAL=$?
	echo
	echo -n "Shutting down NMB services: "
	killproc nmbd
	RETVAL2=$?
	[ $RETVAL -eq 0 -a $RETVAL2 -eq 0 ] && rm -f /var/lock/subsys/smb
	echo ""
	umount $SAMBA_ROOT
	/etc/ha.d/resource.d/IPaddr 192.168.2.23 $1
	;;
  restart)
	$0 stop
	$0 start
	;;
  status)
        status smbd
        status nmbd
        RETVAL=$?
        ;;
  *)
	echo "Usage: $0 {start|stop}"
	exit 1
esac

exit $RETVAL

and the /etc/fstab had an entry :

/dev/sdc1               /cluster3               ext3    noauto          0 0 

For information, '/etc/ha.d/resource.d/IPaddr' is part of Linux-HA and allows to take properly an
IP-addr with ARP broadcast and so on...

On the net we also had another samba server, with its wins server activated. We
needed this because with use a 'virtual' IP address and a 'virtual' Netbios &
DNS name. By the way...the cluster smb.conf had an entry :
interfaces = 192.168.2.23/32 
in order to force it to only use the clustered-IP address and not the server's
own.
With all this (wins server, but i'm not sure it was really needed and a limited
use of interfaces) clients always see a server called cluster3, and not one
time a cluster2, one time a cluster1, depending on which servers own the
service at one moment.

So we had a highly-available samba server, keeping always the same name. From
the client side, the samba server falling down is seen ONLY as a 15 seconds
interruption, nothing more !
Visitors were very interested and impressed, and Jean Francois Micouleau very
angry about the french truck drivers on strike who prevent him from coming to
the Linux Expo ;)

Frederic

-- 
Frédéric Dubuy <frederic.dubuy at medasys-digital-systems.fr>
Centre de compétences Logiciels Libres - Medasys
Tel : (33) 1.69.33.73.69 Fax : (33) 1.69.33.73.01
http://www.medasys-digital-systems.fr/linux