[Samba] samba4 + drbd + ctdb + failover

Sun Jul 6 01:49:33 MDT 2014

On Sun, 2014-07-06 at 10:28 +1100, me at electronico.nc wrote:
> Le 06/07/2014 04:23, steve a écrit :
> > Hi
> > We've got drbd going between 2 nodes:)
> >
> > ATM there is un-partitioned space on each node but (we think) they are
> > syncing OK. It looks as though it has synced the whole partition (2GB)
> > from the primary node 1 to the other node:
> >
> > node 1
> >   smb1:/home/steve # cat /proc/drbd
> > version: 8.4.4 (api:1/proto:86-101)
> > GIT-hash: 3c1f46cb19993f98b22fdf7e18958c21ad75176d build by SuSE Build
> > Service
> >
> >   1: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
> >      ns:2096028 nr:0 dw:0 dr:2096028 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1
> > wo:f oos:0
> >
> > node 2
> > smb2:/home/steve # cat /proc/drbd
> > version: 8.4.4 (api:1/proto:86-101)
> > GIT-hash: 3c1f46cb19993f98b22fdf7e18958c21ad75176d build by SuSE Build
> > Service
> >
> >   1: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-----
> >      ns:0 nr:2096028 dw:2096028 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1
> > wo:f oos:0
> >
> > Question:
> > What next?
> > 0. Can any drbd guru give us a thumbs up so far?
> I'm not a guru.
> Here is what I've done with DRBD (ubuntu):
> nano /etc/drbd.d/raid1.res
> address are defined in /etc/network/interfaces and are assigned to the 
> direct Gb NIC between smb1 and smb2 (for fast sync between HDs)
> disk : your free partition on each server
> ->
> resource raid1 {
>   protocol C;
>   startup { wfc-timeout 0; degr-wfc-timeout 120; }
> 
> disk { on-io-error detach; }
>   on smb1.domain.lan {
>    device /dev/drbd0;
>    disk /dev/sda5;
>    meta-disk internal;
>    address 10.10.200.1:7788;
>   }
>   on smb2.domain.lan {
>    device /dev/drbd0;
>    disk /dev/sda5;
>    meta-disk internal;
>    address 10.10.200.2:7788;
>   }
> }
> 
> drbdadm create-md raid1
> drbdadm up raid1
> drbdadm attach raid1
> drbdadm connect raid1
> gives ->
> "drbdsetup connect raid1 ipv4:10.10.200.1:7788 ipv4:10.10.200.2:7788 
> --protocol=C' terminated with exit code 10"
> 
> only on smb1:
> drbdadm primary --force raid1
> 
> > 1. Is it too late to format the partition(s)?
> Something like (only on one server : the RAID is active)
> mkfs.ext4 /dev/drbd0
> (on smb1 and smb2) : create your mount point:
> mkdir /media/my_mount_point
> chown and chmod for your needs
> > 1. Recommendation for file system to format the partition.
> > 2. Do we need Pacemaker (or something) for the failover or does...
> I've tested by making a NFS share to test HA :
> on smb1 and smb2:
> apt-get install nfs-kernel-server
> nano /etc/exports
> ->
> /media/my_mount_point 
> 10.10.20.0/24(rw,async,no_root_squash,no_subtree_check,fsid=1)
> 
> NFS should not be started as service but by Heartbeat, so:
> update-rc.d -f nfs-common remove
> update-rc.d -f nfs-kernel-server remove
> 
> apt-get install heartbeat
> nano /etc/ha.d/ha.cf
> ->
> autojoin none
> # auto_failback on : smb1 is prefered as server, so Heartbeat will use 
> it as often as possible
> auto_failback on
> keepalive 2
> warntime 5
> deadtime 10
> initdead 20
> bcast xenbr1
> node smb1.domain.lan
> node smb2/domain.lan
> logfile /var/log/heartbeat-log
> debugfile /var/log/heartbeat-debug
> 
> Heartbeat has to talk to servers using a password.
> nano /etc/ha.d/authkeys
> ->
> auth 3
> 3 md5 my_heartbeat_password
> 
> chmod 600 /etc/heartbeat/authkeys
> 
> Heartbeat actions configuration:
> nano /etc/ha.d/haresources
> eth1 : your LAN interface
> 10.10.20.1 : the virtual IP created, has to be a free IP on same smb1 
> and smb2 LAN subnet
> drbddisk::raid1 : the name of the ressource for the RAID array defined 
> in /etc/drbd.d/raid1.res ( resource raid1 { ...)
> ->
> smb1.domain.lan \
> IPaddr::10.10.20.1/24/eth1 \
> drbddisk::raid1 \
> Filesystem::/dev/drbd0::/media/my_mount_point::ext4::nosuid,usrquota,noatime 
> \
> nfs-kernel-server
> 
> service heartbeat restart
> 
> on smb1:
> ifconfig should show a eth1:0 interface with IP 10.10.20.1
> 
> You can test to access the NFS share from a client on LAN.
> shutdown smb1 : NFS share is still accessible from the virtual IP : 
> 10.10.20.1, write something to NFS share.
> power up smb1 : NFS share is still accessible from the virtual IP : 
> 10.10.20.1, read data on NFS share : they have well been written while 
> smb2 was the NFS server.
> 
> important file to watch while testing : Heartbeat debug file : 
> /var/log/heartbeat-debug
> > 3. ...CTDB do that for us
> >
> > Thanks,
> > Steve
> >
> >
> I haven't had time yet to apply this the Samba File Server, but here is 
> my working config.
> Hope this helps.
> Nicolas

Yes, that's great. Our drbd looks very similar. Our main question is
that we _think_ that heartbeat is not necessary if we are to use ctbd.

It seems that our next step toward ctdb must be gfs or some other
clustered file system.

If anyone could confirm, that would be great.

Thanks for your time,
Steve