CTDB We are still serving a public address...

patrick medina pgmedinajr at gmail.com
Thu Nov 1 16:26:58 MDT 2012


Not being satisfied and unable to recreate this issue on my test lab, my
coworker and I kept digging at the problem.

We discovered one node out of three was reporting the wrong "ctdp ip"
information from the other two.  We would stop ctdb, bring up one node at a
time until everyone was healthy and all IP's were distributed evenly.  Then
we rebooted a node and tested with "ctdb ip" again and discovered node 0
was not reclaiming an IP.

Node 0

192.168.1.1  1
192.168.1.2  2
192.168.1.3  2

node 1 & Node 2

192.168.1.1  0
192.168.1.2  1
192.168.1.3  2

At times the node 0 would become healthy yet when we did an "ip addr" we
could see that 192.168.1.1 was not in use.  I decided to apt-get purge ctdb
(ubuntu 12.04) but did this not fix the issue.  Running out of time, I did
a full re install and we're back up in business.  I used to only go by OK
(THIS NODE) but now i'll also verify by "ctdb ip & ip addr" to verify i'm
really redundant

PG



On Sun, Oct 28, 2012 at 1:44 PM, patrick medina <pgmedinajr at gmail.com>wrote:

> Thanks Martin,
>
> Big bummer, but as long as there is a workaround I can deal with it until
> then.  Thanks again!
>
> Gil
>
>
>
> On Sun, Oct 28, 2012 at 4:28 AM, Martin Schwenke <martin at meltin.net>wrote:
>
>> On Sat, 27 Oct 2012 20:26:18 -0600, patrick medina
>> <pgmedinajr at gmail.com> wrote:
>>
>> > I thought I had CTDB down, but it looks like I'm running into another
>> > issue.
>> >
>> > I have 2 nodes with 2 public addresses this way I can RRDNS, this worked
>> > fine and was tested for  until I thought it was good to go.  Last night
>> one
>> > of our nodes went down and after the reboot and I'm getting the
>> following
>> > in the logs:  (The 2nd node is up and happy, but this guy remains on
>> > unhealthy)
>> >
>> >
>> > 2012/10/27 19:53:00.678119 [ 1510]: server/ctdb_takeover.c:813
>> release_ip
>> > of IP x.x.x.x is
>> > known to the kernel, but we have no interface assigned, has someone
>> > manually configured it? Ignore for now.
>> > [...]
>>
>> This is a bug that we've fixed at a couple of different levels in
>> CTDB.  There should be a public release of CTDB very soon that includes
>> this fix.
>>
>> Right now you should be able to work around this by manually removing
>> the IP shown in the message using "ip addr del ...".
>>
>> If you've built CTDB from sources obtained via git then you could
>> rebuild after cherry-picking the following patches that repair a node
>> when it gets into this state:
>>
>>   c6bf22ba5c01001b7febed73dd16a03bd3fd2bed
>>   f07376309e70f5ccdb7de8453caacc71b451ab48
>>
>> You can use "git show <sha>" to see what the patches do.
>>
>> This problem is often caused by a race between a node taking over an IP
>> and releasing it in quick succession.  We also have some fixes for the
>> race...
>>
>> > Did I configure RRDNS wrong, on my dev box this worked like a charm but
>> > once I went production it's not so happy.  :/
>>
>> Probably bad luck that you've hit the race only after going
>> into production...  :-(
>>
>> peace & happiness,
>> martin
>>
>
>


More information about the samba-technical mailing list