Some thoughts on the external recovery lock helper

Richard Sharpe realrichardsharpe at gmail.com
Fri Jun 17 14:56:56 UTC 2016


On Thu, Jun 16, 2016 at 9:00 PM, Amitay Isaacs <amitay at gmail.com> wrote:
> Hi Richard,
>
> On Fri, Jun 17, 2016 at 9:54 AM, Richard Sharpe
> <realrichardsharpe at gmail.com> wrote:
>>
>> Hi Martin,
>>
>> We have just discovered that we really need to use this since NFSv4
>> lease timeouts, at least in our environment, are 90 seconds.
>>
>> So, I sat down and thought about how to use etcd to provide this function.
>>
>> Basically, what you want is to maintain a key in the KV store that has a
>> TTL.
>
> Sorry WE don't want to do anything. :-)

Yeah, I used 'you' in the general sense there :-)

>>
>>
>> Eg,
>>
>> curl http://127.0.0.1:2379/v2/keys/ctdb-recovery-lock -XPUT -d
>> value=<my-node-name> -d ttl=5
>>
>> Then, every 2 to 2.5 seconds, refresh the TTL:
>>
>> curl http://127.0.0.1:2379/v2/keys/ctdb-recovery-lock -XPUT -d ttl=5
>> -d refresh=true -d prevExist=true
>>
>> And then you somehow have to communicate from ctdb to the helper
>> daemon to say lock and unlock. However, if ctdb dies while the helper
>> has taken the lock then it will never unlock and ...
>>
>> So, what would be real nice is if this was incorporated into an
>> plug-in API for CTDB so that if ctdb dies, refreshing stops and
>> recovery can occur on another node.
>>
>> So, essentially, what I am thinking of is a plugin API that perhaps
>> has four calls:
>>
>> recovery_lock_init
>> recovery_lock_take
>> recovery_lock_release
>> recovery_lock_cleanup
>>
>> Now, recovery_lock_init would do any initialization needed. Perhaps it
>> could be passed the config info. It might create a thread.
>>
>> recover_lock_take would take the lock, and in the context of what I
>> mentioned above about etcd, it would use libcurl to make the
>> appropriate REST calls to etcd and would either start a thread to
>> refresh the key or would tell the thread to refresh the key every 2.5
>> seconds.
>>
>> recovery_lock_release would stop the refreshing process and delete the
>> key.
>>
>> recovery_lock_cleanup would do any cleanup, like deleting any threads
>> that have been started.
>>
>> By making this processing part of ctdb we can solve one of the HA
>> issues, which is how to we kill the refreshing daemon when CTDB dies.
>> If it is part of CTDB, the kernel does that for us.
>
>
> You can do all this in the current helper framework.  I have not seen any
> evidence that you cannot do anything specific in the current mutex helper
> model.

Given the clarification below it does seem that you can, it is just
not as clean as I would like :-)

> When mutex helper is run, it should get the lock and output 0 on stdout and
> keep holding the lock.  To hold a lock, if it requires sending a message to
> etcd (or whatever daemon you are talking to) every 2-3 seconds, then it can
> be done from within helper.  If helper loses the lock for some reason,
> helper should exit.  That will inform ctdb.

OK, that does seem to handle one of the error cases I was thinking about.

> Also, the helper process must check for the parent process.  (Looks like
> it's not documented in doc/cluster_mutex_helper.txt even though it's in the
> code server/ctdb_mutex_fcntl_helper.c)  If the parent process has gone away,
> then it should release the lock and terminate.

OK, so this part is the harder part. If the parent crashes then the
helper needs to exit. It is not impossible, but a design where the
helper is called as a (shared) library function gets this for free. We
have lots of experience with this sort of thing in Samba.

However, it is not impossible.

> The releasing of lock is synonymous with the mutex helper process going
> away.  CTDB always sends SIGTERM to the helper process, so it can release
> the lock and then terminate.

It appears that we (the nefarious we) will be embarking on this soon, so:

1. We should be in the position to provide an example implementation, and

2. Thanks for the clarification, and

3. If I get time I will try to work up the library approach.

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)



More information about the samba-technical mailing list