Setting up CTDB on OCFS2 and VMs ...

Mon Dec 15 07:14:37 MST 2014

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Rowland,

Am 15.12.14 um 14:25 schrieb Rowland Penny:
> On 13/12/14 13:55, Michael Adam wrote:
>> On 2014-12-13 at 12:31 +0000, Rowland Penny wrote:
>>> OK, I now have a single node up and running as per the 
>>> instructions provided by Ronnie.
>> Yay!
>> 
>>> I just have a few questions:
>>> 
>>> there is this in the ctdb log
>>> 
>>> 2014/12/13 11:52:43.522708 [ 5740]: Set runstate to INIT (1) 
>>> 2014/12/13 11:52:43.540992 [ 5740]: 00.ctdb: awk: line 2:
>>> function gensub never defined 2014/12/13 11:52:43.543178 [
>>> 5740]: 00.ctdb: awk: line 2: function gensub never defined 
>>> 2014/12/13 11:52:43.545354 [ 5740]: 00.ctdb: awk: line 2:
>>> function gensub never defined
>> In Debian, there are several possible awk versions that provide
>> awk: at least: mawk, gawk, original-awk. Which is chosen if
>> multiple are installed, depends on the alternatives-mechanism: 
>> update-alternatives --display awk update-alternatives --edit awk
>> 
>> A quick web search has reveiled that only the gawk (gnu awk)
>> variant might feature the needed gensub function.
>> 
>> Maybe we should change "awk" to "gawk" in our scripts and
>> packages would need to adapt their dependencies.
>> 
>> 
>>> 2014/12/13 11:52:56.931393 [recoverd: 5887]: We are still
>>> serving a public IP '127.0.0.3' that we should not be serving.
>>> Removing it 2014/12/13 11:52:56.931536 [ 5740]: Could not find
>>> which interface the ip address is hosted on. can not release
>>> it 2014/12/13 11:52:56.931648 [recoverd: 5887]: We are still
>>> serving a public IP '127.0.0.2' that we should not be serving.
>>> Removing it
>>> 
>>> The above three lines are there 4 times
>> I guess this will not be the case any more, when you move to a
>> more realistic setup where you don't use loopback for hosting
>> nodes internal and public addresses, but for a start that is ok.
>> 
>>> the final 4 lines are:
>>> 
>>> 2014/12/13 11:53:02.982441 [ 5740]: monitor event OK - node
>>> re-enabled 2014/12/13 11:53:02.982480 [ 5740]: Node became
>>> HEALTHY. Ask recovery master 0 to perform ip reallocation 
>>> 2014/12/13 11:53:02.982733 [recoverd: 5887]: Node 0 has changed
>>> flags - now 0x0  was 0x2 2014/12/13 11:53:02.983266 [recoverd:
>>> 5887]: Takeover run starting 2014/12/13 11:53:03.046859
>>> [recoverd: 5887]: Takeover run completed successfully
>>> 
>>> ctdb status shows:
>>> 
>>> Number of nodes:1 pnn:0 127.0.0.1        OK (THIS NODE) 
>>> Generation:740799152 Size:1 hash:0 lmaster:0 Recovery
>>> mode:NORMAL (0) Recovery master:0
>> Great!
>> 
>>> Now I know it works, I just have to pull it all together.
>> Right. Next step: take a "real" ethernet interface and first use
>> that for nodes address. You can even start here with a single
>> node.
>> 
>> You can also go towards more realistic clusters in two steps:
>> First no public addresse, only the nodes file. That is the core
>> of a ctdb cluster. Then you can go towards cluster-resource
>> management and add public addresse and also CTDB_MANAGES_SAMBA
>> and friends.
>> 
>> One further note: Virtual machines or even containers (lxc or
>> docker) are awesome for setting up such clusters for learing and 
>> testing. I use that for development myselves.
>> 
>> And here is one (imho) very neat trick: If you use lxc containers
>> (or docker can probably also do that), you can completely take
>> the complexity of having to set up a cluster file system out of 
>> the equation: You can just bind mount a directory of the host
>> file system into the node containers' root file systems by the
>> lxc fstab file. Thereby you have a posix-file system that is
>> shared between the nodes and you can use that as cluster FS.
>> 
>> This way, you can concentrate on ctdb and samba immediately until
>> you are comfortable with that.
>> 
>> I wanted at some point to provide a mechanism to set such a thing
>> up automatically, by just providing some config files. Maybe I'll
>> investigate the vagrant+puppet approach that Ralph Böhme has
>> recently posted in this or a related thread...
>> 
>> Cheers - Michael
>> 
> 
> Getting closer :-)
> 
> I now have two ctdb nodes up and running:
> 
> root at cluster1:~# ctdb status Number of nodes:3 (including 1 deleted
> nodes) pnn:1 192.168.1.10     OK (THIS NODE) pnn:2 192.168.1.11
> OK Generation:1073761636 Size:2 hash:0 lmaster:1 hash:1 lmaster:2 
> Recovery mode:NORMAL (0) Recovery master:1
> 
> This is with CTDB_RECOVERY_LOCK turned off, if I turn it on, the
> nodes go unhealthy. I am putting the lockfile on the shared
> cluster, should I be putting it somewhere else, it says at the top
> of /etc/default/ctdb :
> 
> # Shared recovery lock file to avoid split brain.  No default. # #
> Do NOT run CTDB without a recovery lock file unless you know
> exactly # what you are doing. 
> #CTDB_RECOVERY_LOCK=/some/place/on/shared/storage
> 
> As I don't know what I am doing, I need to run the recovery
> lockfile :-D
> 
> Rowland

Are there any errors in /var/log/log.ctdb? Which version of ctdb you
are using? And could you post your /etc/sysconfig/ctdb file?

Regards

Stefan

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)

iEYEARECAAYFAlSO7M0ACgkQ2JOGcNAHDTaWwQCgr7wrvbECapJxNNRkI5jKkwdg
63QAoIvRy5V3tsccLYdBsagLqMwy8Mry
=RNHK
-----END PGP SIGNATURE-----