Setting up CTDB on OCFS2 and VMs ...

Thu Jan 1 11:51:21 MST 2015

On 01/01/15 18:30, Michael Adam wrote:
> Oops, I replied before realizing that Martin Schwenke
> and Stefan Kania had already replied to your mail,
> and that Martin has nailed your issues to the
> corresponding debian backports packaging issues
> already. So most of my mail can be ignored. :-)
>
> Still: You need to get your clusterfs setup right
> so that you can use a reclock....
>
> Michael
>
> On 2015-01-01 at 19:25 +0100, Michael Adam wrote:
>> Happy new year!
>>
>> On 2014-12-31 at 18:40 +0000, Rowland Penny wrote:
>>> On 31/12/14 17:59, Michael Adam wrote:
>>>> On 2014-12-31 at 15:46 +0000, Rowland Penny wrote:
>>>>> OK, I have been having another attempt at the ctdb cluster, I cannot get
>>>>> both nodes healthy if I use a lockfile in /etc/default/ctdb,
>>>> This can't be expected to work, since a recovery lock file needs
>>>> to be on shared storage (clustered file system, providing posix
>>>> fcntl byte range lock semantics), and /etc/default/ is not
>>>> generally such a place, unless you are building a fake cluster
>>>> with multiple ctdb instances running on one host.
>>> The lockfile *is* on the shared area i.e. I am sharing /cluster and that is
>>> what I have in /etc/default/ctdb:
>>>
>>> CTDB_RECOVERY_LOCK=/cluster/lockfile
>> Oh, ok... I misread your words above.
>>
>>>>> so I have commented it out, both nodes are now showing OK.
>>>> It is possible to run w/o recovery lock, but it is not
>>>> recommended in a production setup at least.
>>> I am aware of this, but it seems to be the only way of getting ctdb to
>>> start.
>> Which probably means that your clustered fs setup is still
>> not correct. Or there is still a flaw in your ctdb setup.
>> Could you (re-)post your /etc/default/ctdb and /etc/ctdb/nodes
>> and also your network config (ip a l) on the nodes?
>>
>> It is really important to get this right before starting to
>> seriously play with clustered samba on top.
>>
>>>>> Why have very similar data in 3 places ? why have the conf (which
>>>>> incidentaly isn't called a conf file) in a different place from the other
>>>>> ctdb files in /etc ?
>>>> That's essentially two places, one hierarchy under /var/ctdb
>>>> (old ctdb versions) and one hierarchy under /var/lib/ctdb (new
>>>> ctb versions) so my guess is that this stems from earlier
>>>> installs of older versions.
>>>>
>>>> If you stop ctdb, remove both these directory trees, and then
>>>> restart ctdb, do both trees reappear?
>>> No idea, I have only installed ctdb *once*, there is no earlier version.
>> Ok. Does that mean that you performed the above steps and
>> both directory trees reappeared?
>>
>>>>> More to the point, Why, oh why doesn't it work.
>>>> Has the samba version been compiled against the used ctdb
>>>> version. One possible source of such problems is that
>>>> samba might have beenm compiled against an older version
>>>> of ctdb and then you install the latest version of ctdb.
>>> Again, no idea, I am using the samba4 & ctdb packages from backports,
>>> versions 4.1.9 & 2.5.3
>>>> The problem that could explain the "Could not initialize ..."
>>>> message would be that samba tries to access CTDB under the
>>>> socket file /tmp/ctdbd.socket (default in old ctdb versions)
>>>> and the new ctdbd uses /var/run/ctdb/ctdbd.socket by default.
>>> Now that is interesting, because if I do not put a line in smb.conf saying
>>> where ctdbd.socket is, it tries to use /tmp.
>> That confirms my theory. And it means that
>> the samba and ctdv packages from backports simply don't match.
>> Samba has apparently been compiled against an oder version
>> of ctdb that still used /tmp.
>>
>>> With the line in smb.conf, it
>>> just errors with: connect(/var/lib/ctdb/ctdb.socket) failed: No such file or
>>> directory
>> Er strange, and you have entered "/var/run/ctdb/..." into
>> smb.conf and not by accient "/var/lib/ctdb/... ?
>>
>>>> So you could (without needing to recompile) test if things
>>>> work out more nicely if you set:
>>>>
>>>> "ctdbd socket = /var/run/ctdb/ctdbd.socket"
>>>> in smb.conf
>>> No, but finding out where the socket is and altering the line to: ctdbd
>>> socket = /var/lib/run/ctdb/ctdbd.socket
>>>
>>> and running: net ads join -U Administrator at EXAMPLE.COM -d5
>>>
>>> Gets me (after a lot of output)
>>>
>>> Using short domain name -- EXAMPLE
>>> Joined 'SMBCLUSTER' to dns domain 'example.com'
>>> Not doing automatic DNS update in a clustered setup.
>>> return code = 0
>>>
>>> Good Grief!!!! It actually seems to have worked =-O
>> Yay!
>>
>> That path is strange.
>> Are there some symlinks involved or maybe missing?
>> Or does the debian ctdb package have an altered
>> path for the socket? That may well be.
>>
>> Which debian version are you using? (I could inspect it locally.)
>>
>>> Now to try altering the conf file to get it start smbd, nmbd and winbind.
>>>
>>>> and (for the sake of explicitness):
>>>> "CTDB_SOCKET=/var/run/ctdb/ctdbd.socket"
>>>> in /etc/default/ctdb.
>>> I have tried similar lines in /etc/default/ctdb, but whatever I tried, it
>>> just wouldn't let ctdb start.
>> Er, that should not be. Could you post the exact
>> /etc/default/ctdb file used and the error messages
>> that ctdbd prints?
>>
>> Cheers - Michael
>

OK, I understand that I don't really know how to setup a cluster and in 
a lot of ways I don't know how they are supposed to work, but this is 
how I have setup my test cluster, it is based on the instructions that 
Richard Sharp posted, but installed on Debian instead of Centos.

Could someone look at it and tell me where I am going wrong :-)

Can anybody confirm that the ctdb package in Debian backports isn't 
built against the samba package available from backports.

1. Create two VirtualBox VMs with enough memory and disk for your Linux 
Distro. I used Debian 7.7 with 512MB and 8GB. You will also need an 
extra interface on each VM for the clustering private network. I set 
them to an internal type.

2. Because you will need a shared disk, create one:

vboxmanage createhd --filename ~/VirtualBox\ VMs/SharedHD1 --size 10240 
--variant Fixed --format VDI # Creates a 10GB fixed sized disk

vboxmanage modifyhd ~/VirtualBox\ VMs/SharedHD1.vdi --type shareable

Also, use the GUI to add the shared disk to both VMs.

3. Install the OS on each of the VM's.

4. Install clustering packages:

apt-get install openais corosync pacemaker ocfs2-tools-pacemaker dlm-pcmk

5. Configure corosync

nano /etc/corosync/corosync.conf

Make sure that bindnetaddr is defined and points to your private 
interface. I set it to 192.168.1.0

# Copy the file to the other node.
scp /etc/corosync/corosync.conf root at 192.168.0.7:/etc/corosync/

# ON BOTH NODES
service pacemaker stop  # Stop these in case they were running
service corosync stop # Same here

nano /etc/default/corosync
Change:
START=no

To:

START=yes

# now start the cluster
service corosync start

# Also start it on the other node(s).

# Now check the status:
root at cluster1:~# crm_mon -1
============
Last updated: Mon Dec 15 10:46:20 2014
Last change: Mon Dec 15 10:44:18 2014 via crmd on cluster1
Stack: openais
Current DC: cluster1 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ cluster1 cluster2 ]

If you do not see all the other nodes online, then you have to debug the 
problem.

6. Configure the Oracle cluster

dpkg-reconfigure ocfs2-tools
Configuring ocfs2-tools
# Would you like to start an OCFS2 cluster (O2CB) at boot time?
#
# <Yes>
#
# Name of the cluster to start at boot time:
#
# ctdbdemo

# Create the ocfs2 cluster conf file
o2cb_ctl -C -n ctdbdemo -t cluster
o2cb_ctl -C -n cluster1 -t node -a number=1 -a ip_address=192.168.1.10 
-a ip_port=7777 -a cluster=ctdbdemo
o2cb_ctl -C -n cluster2 -t node -a number=2 -a ip_address=192.168.1.11 
-a ip_port=7777 -a cluster=ctdbdemo

# ON BOTH NODES
service corosync stop

# Copy files to the other node.
scp /etc/default/o2cb  root at 192.168.0.7:/etc/default/
scp /etc/ocfs2/cluster.conf  root at 192.168.0.7:/etc/ocfs2/

service o2cb start
service corosync start

7. Create the shared shared file system on one node:

mkfs.ocfs2 -L CTDBdemocommon -T datafiles -N 4 /dev/sdb

8. Mount it on both and ensure that you can create files/dirs on one 
node and see them on the other node.

mkdir /cluster
mount -t ocfs2 /dev/sdb /cluster
umount /cluster

Create a bash script

nano config.sh

#!/bin/bash

crm configure<<EOF
primitive p_dlm_controld ocf:pacemaker:controld \
   op start interval="0" timeout="90" \
   op stop interval="0" timeout="100" \
   op monitor interval="10"
primitive p_fs_ocfs2 ocf:heartbeat:Filesystem \
   params device="/dev/sdb" \
     directory="/cluster" \
     fstype="ocfs2" \
   meta target-role=Started \
   op monitor interval="10"
group g_ocfs2 p_dlm_controld p_fs_ocfs2
clone cl_ocfs2 g_ocfs2 \
   meta interleave="true"
EOF

exit 0

bash ./config.sh

crm configure property stonith-enabled=false
crm configure property no-quorum-policy=ignore

9. Install ctdb

apt-get -t wheezy-backports install ctdb

10. Configure ctdb

nano /etc/default/ctdb

# Options to ctdbd, read by ctdbd_wrapper(1)
#
# See ctdbd.conf(5) for more information about CTDB configuration variables.

# Shared recovery lock file to avoid split brain.  No default.
#
# Do NOT run CTDB without a recovery lock file unless you know exactly
# what you are doing.
#CTDB_RECOVERY_LOCK=/some/place/on/shared/storage
#CTDB_RECOVERY_LOCK=/cluster/lockfile

# List of nodes in the cluster.  Default is below.
CTDB_NODES=/etc/ctdb/nodes

# List of public addresses for providing NAS services.  No default.
CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses

# What services should CTDB manage?  Default is none.
#CTDB_MANAGES_SAMBA=yes
#CTDB_MANAGES_WINBIND=yes
# CTDB_MANAGES_NFS=yes

# Raise the file descriptor limit for CTDB?
# ulimit -n 10000

# Default is to use the log file below instead of syslog.
CTDB_LOGFILE=/var/log/ctdb/log.ctdb
CTDB_SYSLOG=no

# Default log level is ERR.  NOTICE is a little more verbose.
CTDB_DEBUGLEVEL=NOTICE

# Set some CTDB tunable variables during CTDB startup?
CTDB_VARDIR=/var/lib/ctdb
# CTDB_SET_TraverseTimeout=60
export CTDB_SOCKET

nano /etc/ctdb/nodes
192.168.1.10
192.168.1.11

nano /etc/ctdb/public_addresses # NOTE: These addresses *SHOULD NOT* exist
192.168.0.8/8 eth0
192.168.0.9/8 eth0

# Copy files to the other node.
scp /etc/default/ctdb  root at 192.168.0.7:/etc/default/
scp /etc/ctdb/nodes  root at 192.168.0.7:/etc/ctdb/
scp /etc/ctdb/public_addresses  root at 192.168.0.7:/etc/ctdb/

# Create a missing directory (on both nodes)
mkdif -p /var/lib/run/ctdb

11. Start ctdb on all nodes

# You must have ctdb started so that the secrets file will get distributed
service ctdb start

Check status:

root at cluster1:~# ctdb status
Number of nodes:3 (including 1 deleted nodes)
pnn:1 192.168.1.10     OK (THIS NODE)
pnn:2 192.168.1.11     OK
Generation:1073761636
Size:2
hash:0 lmaster:1
hash:1 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:1

#12. Turn on the lockfile

#nano /etc/default/ctdb
#set the lockfile:

#CTDB_RECOVERY_LOCK=/cluster/lockfile

#restart ctdb on all nodes.

#service ctdb restart

#Wait a short while and then check the status again.

13. Install samba

apt-get -t wheezy-backports install samba attr krb5-config krb5-user ntp 
dnsutils winbind libpam-winbind libpam-krb5 libnss-winbind libsmbclient 
smbclient

service smbd stop
service nmbd stop
service winbind stop

update-rc.d -f smbd remove
update-rc.d -f nmbd remove
update-rc.d -f winbind remove

14. Configure samba for the domain you want to join

[global]
     workgroup = EXAMPLE
     netbios name = SMBCLUSTER
     security = ADS
     realm = EXAMPLE.COM
     dedicated keytab file = /etc/krb5.keytab
     kerberos method = secrets and keytab
     server string = Samba 4 Client %h
     winbind enum users = yes
     winbind enum groups = yes
     winbind use default domain = yes
     winbind expand groups = 4
     winbind nss info = rfc2307
     winbind refresh tickets = Yes
     winbind normalize names = Yes
     idmap config * : backend = tdb
     idmap config * : range = 2000-9999
     idmap config EXAMPLE : backend  = ad
     idmap config EXAMPLE : range = 10000-999999
     idmap config EXAMPLE : schema_mode = rfc2307
     clustering = Yes
     ctdbd socket = /var/lib/run/ctdb/ctdbd.socket
     printcap name = cups
     cups options = raw
     usershare allow guests = yes
     domain master = no
     local master = no
     preferred master = no
     os level = 20
     map to guest = bad user
     username map = /etc/samba/smbmap
     vfs objects = acl_xattr
     map acl inherit = Yes
     store dos attributes = Yes
     log level = 6
     wins server = 192.168.0.2

[users]
     comment = Home Directories
     path = /cluster/users
     browseable = no
     read only = No

[profiles]
     path = /cluster/profiles
     read only = No

[testdir]
         path = /cluster/testdir
         read only = no

15. join the domain

join the domain from node 1 only:
net ads join -UAdministrator

16. Enable samba & winbind in the ctdb config

CTDB_MANAGES_SAMBA=yes
CTDB_MANAGES_WINBIND=yes

17. Restart ctdb on all nodes

These are the hosts & interfaces files from the two cluster machines.

cluster1)

/etc/hosts

127.0.0.1       localhost
192.168.0.6     cluster1.example.com       cluster1

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

/etc/network/interfaces

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
         address 192.168.0.6
         netmask 255.255.255.0
         network 192.168.0.0
         broadcast 192.168.0.255
         gateway 192.168.0.1

auto eth1
iface eth1 inet static
         address 192.168.1.10
         netmask 255.255.255.0
         network 192.168.1.0
         broadcast 192.168.1.255

cluster2)

/etc/hosts

127.0.0.1       localhost
192.168.0.7     cluster2.example.com       cluster2

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

/etc/network/interfaces

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
         address 192.168.0.7
         netmask 255.255.255.0
         network 192.168.0.0
         broadcast 192.168.0.255
         gateway 192.168.0.1

auto eth1
iface eth1 inet static
         address 192.168.1.11
         netmask 255.255.255.0
         network 192.168.1.0
         broadcast 192.168.1.255

Rowland