Setting up CTDB on OCFS2 and VMs ...
Stefan Kania
stefan at kania-online.de
Mon Dec 15 07:14:37 MST 2014
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Rowland,
Am 15.12.14 um 14:25 schrieb Rowland Penny:
> On 13/12/14 13:55, Michael Adam wrote:
>> On 2014-12-13 at 12:31 +0000, Rowland Penny wrote:
>>> OK, I now have a single node up and running as per the
>>> instructions provided by Ronnie.
>> Yay!
>>
>>> I just have a few questions:
>>>
>>> there is this in the ctdb log
>>>
>>> 2014/12/13 11:52:43.522708 [ 5740]: Set runstate to INIT (1)
>>> 2014/12/13 11:52:43.540992 [ 5740]: 00.ctdb: awk: line 2:
>>> function gensub never defined 2014/12/13 11:52:43.543178 [
>>> 5740]: 00.ctdb: awk: line 2: function gensub never defined
>>> 2014/12/13 11:52:43.545354 [ 5740]: 00.ctdb: awk: line 2:
>>> function gensub never defined
>> In Debian, there are several possible awk versions that provide
>> awk: at least: mawk, gawk, original-awk. Which is chosen if
>> multiple are installed, depends on the alternatives-mechanism:
>> update-alternatives --display awk update-alternatives --edit awk
>>
>> A quick web search has reveiled that only the gawk (gnu awk)
>> variant might feature the needed gensub function.
>>
>> Maybe we should change "awk" to "gawk" in our scripts and
>> packages would need to adapt their dependencies.
>>
>>
>>> 2014/12/13 11:52:56.931393 [recoverd: 5887]: We are still
>>> serving a public IP '127.0.0.3' that we should not be serving.
>>> Removing it 2014/12/13 11:52:56.931536 [ 5740]: Could not find
>>> which interface the ip address is hosted on. can not release
>>> it 2014/12/13 11:52:56.931648 [recoverd: 5887]: We are still
>>> serving a public IP '127.0.0.2' that we should not be serving.
>>> Removing it
>>>
>>> The above three lines are there 4 times
>> I guess this will not be the case any more, when you move to a
>> more realistic setup where you don't use loopback for hosting
>> nodes internal and public addresses, but for a start that is ok.
>>
>>> the final 4 lines are:
>>>
>>> 2014/12/13 11:53:02.982441 [ 5740]: monitor event OK - node
>>> re-enabled 2014/12/13 11:53:02.982480 [ 5740]: Node became
>>> HEALTHY. Ask recovery master 0 to perform ip reallocation
>>> 2014/12/13 11:53:02.982733 [recoverd: 5887]: Node 0 has changed
>>> flags - now 0x0 was 0x2 2014/12/13 11:53:02.983266 [recoverd:
>>> 5887]: Takeover run starting 2014/12/13 11:53:03.046859
>>> [recoverd: 5887]: Takeover run completed successfully
>>>
>>> ctdb status shows:
>>>
>>> Number of nodes:1 pnn:0 127.0.0.1 OK (THIS NODE)
>>> Generation:740799152 Size:1 hash:0 lmaster:0 Recovery
>>> mode:NORMAL (0) Recovery master:0
>> Great!
>>
>>> Now I know it works, I just have to pull it all together.
>> Right. Next step: take a "real" ethernet interface and first use
>> that for nodes address. You can even start here with a single
>> node.
>>
>> You can also go towards more realistic clusters in two steps:
>> First no public addresse, only the nodes file. That is the core
>> of a ctdb cluster. Then you can go towards cluster-resource
>> management and add public addresse and also CTDB_MANAGES_SAMBA
>> and friends.
>>
>> One further note: Virtual machines or even containers (lxc or
>> docker) are awesome for setting up such clusters for learing and
>> testing. I use that for development myselves.
>>
>> And here is one (imho) very neat trick: If you use lxc containers
>> (or docker can probably also do that), you can completely take
>> the complexity of having to set up a cluster file system out of
>> the equation: You can just bind mount a directory of the host
>> file system into the node containers' root file systems by the
>> lxc fstab file. Thereby you have a posix-file system that is
>> shared between the nodes and you can use that as cluster FS.
>>
>> This way, you can concentrate on ctdb and samba immediately until
>> you are comfortable with that.
>>
>> I wanted at some point to provide a mechanism to set such a thing
>> up automatically, by just providing some config files. Maybe I'll
>> investigate the vagrant+puppet approach that Ralph Böhme has
>> recently posted in this or a related thread...
>>
>> Cheers - Michael
>>
>
> Getting closer :-)
>
> I now have two ctdb nodes up and running:
>
> root at cluster1:~# ctdb status Number of nodes:3 (including 1 deleted
> nodes) pnn:1 192.168.1.10 OK (THIS NODE) pnn:2 192.168.1.11
> OK Generation:1073761636 Size:2 hash:0 lmaster:1 hash:1 lmaster:2
> Recovery mode:NORMAL (0) Recovery master:1
>
> This is with CTDB_RECOVERY_LOCK turned off, if I turn it on, the
> nodes go unhealthy. I am putting the lockfile on the shared
> cluster, should I be putting it somewhere else, it says at the top
> of /etc/default/ctdb :
>
> # Shared recovery lock file to avoid split brain. No default. # #
> Do NOT run CTDB without a recovery lock file unless you know
> exactly # what you are doing.
> #CTDB_RECOVERY_LOCK=/some/place/on/shared/storage
>
> As I don't know what I am doing, I need to run the recovery
> lockfile :-D
>
> Rowland
Are there any errors in /var/log/log.ctdb? Which version of ctdb you
are using? And could you post your /etc/sysconfig/ctdb file?
Regards
Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
iEYEARECAAYFAlSO7M0ACgkQ2JOGcNAHDTaWwQCgr7wrvbECapJxNNRkI5jKkwdg
63QAoIvRy5V3tsccLYdBsagLqMwy8Mry
=RNHK
-----END PGP SIGNATURE-----
More information about the samba-technical
mailing list