[Samba] CTDB / Samba4. Nodes don't become healthy on first startup

Dave Lawrence dave at daftdroid.com
Wed Apr 27 01:10:50 MDT 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I'm trying to bring up a three node cluster using CTDB and SAMBA4 on a
GlusterFS clustered file system.

The filesystem itself seems to be working just fine, but CTDB doesn't
seem happy.

If I start a single node up:
service ctdb start

After about 10 seconds it becomes healthy, starts SAMBA and takes over
all 3 IP addresses.  However, when I start up subsequent nodes, they
simply refuse to become healthy.  None of the the nodes appear to agree
about who is the recover master.  The recover lock file is always zero
bytes.  If I start all 3 nodes at once, none of them become healthy.

Somewhere along the line I once managed to get two nodes healthy at
once, and they agreed on who was recmaster.  This was a fluke that I
cannot reproduce.

Typical log messages

2011/04/27 07:33:23.633970 [25303]: CTDB_WAIT_UNTIL_RECOVERED
2011/04/27 07:33:23.634002 [25303]: server/ctdb_monitor.c:232 generation
is INVALID. Wait one more second
2011/04/27 07:33:23.695058 [recoverd:25365]: server/ctdb_recoverd.c:1812
Send election request to all active nodes
2011/04/27 07:33:24.196099 [recoverd:25365]: server/ctdb_recoverd.c:1812
Send election request to all active nodes

So I have a number of questions

1) What data is CTDB actually managing in the case of SAMBA4?
Presumably the temporary .tdb files that get created under
/usr/local/samba - do I need to tell CTDB about this location?

2) The reclock file is stored on the clustered filesystem, at a location
that is not part of a network share of any sort.  Is this correct

3) Could this be a problem with GlusterFS

4) Is it OK that the public and private IP ranges are on the same
physical network?  There's certainly no sign of any problem with the
servers communicating with each other on either range using other
protocols, eg ssh.

I am using a recent build of CTDB from Git.  I had similar experiences
with Ubuntu and Debian packages.

Our servers are all VMs, as follows
node0:
 Guest: Ubuntu 10.04
 Host: OpenVZ (linux 2.6.32 x86_64 with Ubuntu config and openvz patch)
 Network: Guest has full control of eth1

node1
 Guest: Ubuntu 10.04 x86
 Host: VMWare ESXI 4 x86_64
 Network: Guest adapter bridged to host

node2
 Guest: Ubuntu 10.04 x86_64
 Host: VMWare Player (Windows 7)
 Network: Guest adapter bridged to host

Our config:
/etc/default/ctdb
 CTDB_RECOVERY_LOCK="mnt/data/lockfile"
 CTDB_PUBLIC_INTERFACE=eth1 # varies per server
 CTDB_PUBLIC_ADDRESSES=/etc/ctdb/public_addresses
 CTDB_MANAGES_SAMBA=yes
 CTDB_SAMBA_CHECK_PORTS="445" # work around grep error in log
 CTDB_MANAGES_WINBIND=yes
 CTDB_SERVICE_SMB=samba4 #name of our init script
 CTDB_NODES=/etc/ctdb/nodes
 CTDB_DBDIR=/var/ctdb
 CTDB_DBDIR_PERSISTENT=/var/ctdb/persistent
 CTDB_LOGFILE=/var/log/log.ctdb
 CTDB_DEBUGLEVEL=3

/etc/ctdb/public_addresses
 192.168.2.119/24 eth1
 192.168.2.120/24 eth1
 192.168.2.121/24 eth1
(network adapter varies)

/etc/ctdb/nodes
 10.0.0.1
 10.0.0.2
 10.0.0.3

excerpt from /etc/network/interfaces
auto eth1
iface eth1 inet static
        address 192.168.2.164
        netmask 255.255.255.0
        gateway 192.168.2.1
        up route add -net 192.168.3.0/24 gw 192.168.2.165
        up route add -net 192.168.4.0/24 gw 192.168.2.156

auto eth1:0
iface eth1:0 inet static
        address 10.0.0.1
        netmask 255.0.0.0

note the static address 192.168.2.164 is not one of the takeover
addresses but is in the same range.  I have also tried with eth1
configured for ONLY the 10.x.x.x range.

Thanks for listening!

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNt8F6AAoJEJjHYqrO/1Xc1m8IAIXA7hAqRSfKYbPEEkdNN3Ms
A5CTbiPwM+HnKfhJSTYYvQzaL90QzhAgLg/lImD4r45V6oGkG1zR9IkaJLrcGfkb
9WhK80x2yId7Lzm3GC3yphQfH+MSFRIL5lJP7Sxglz8rFJXTse0U/FNmsXvJQdvV
gqjnYKlQ8Al/B5PQX6t586YHH+yRb61M/DvRuclLUtcsxrcrFX89bjfkmNkhD26T
/bXvANMtNAGDxVxuwChYOJ05Q2Jt0fQfBslg0U9tR/tqwbzuQZ2W+4TjklC6zz9B
thFH74m+KhCjVEgCJp3oSdTYNAcH7MN1dK8JSsGldfFXuIQhj66jwQxI/tl6978=
=ueFp
-----END PGP SIGNATURE-----



More information about the samba mailing list