[PATCH] CTDB locking changes

Tue Aug 5 01:30:45 MDT 2014

On Tue, Aug 5, 2014 at 12:28 AM, Volker Lendecke <Volker.Lendecke at sernet.de>
wrote:

> On Mon, Jul 28, 2014 at 12:22:56PM +1000, Amitay Isaacs wrote:
> > On Wed, Jul 23, 2014 at 1:11 PM, Amitay Isaacs <amitay at gmail.com> wrote:
> >
> > > Hi,
> > >
> > > Here are the patches that clean up locking in CTDB.
> > >
> > > The major changes include:
> > >
> > > 1. Removal of multiple lock requests per context feature.
> > >
> > > This was a really bad idea and caused out-of-order scheduling of lock
> > > requests.  The code was already commented and not in use.
> > >
> > > 2. Immediate scheduling of lock request provided there aren't max
> number
> > > of requests active.
> > >
> > > The scheme to avoid scheduling record lock requests which are already
> > > active, caused high CPU usage when there were lots of pending requests.
> > > The implementation of pending and active queues using linked list is a
> bad
> > > choice of data structure for this purpose.  In future, it might be
> possible
> > > to improve the scheduling using per database queues rather than a
> single
> > > queue.
> > >
> >
> > > Minor changes include:
> > >
> > > 1. Removing unused elements of data structures, unused function
> > >
> > > 2. Add a new tunable LockProcessesPerDB.  Default is 200.
> > >
> > >  3. Update num_pending statistics when lock is scheduled and not when
> lock
> > > is obtained to give the correct picture of how many lock requests are
> still
> > > in the pending queue.
> > >
> > > Please review and  push if ok.
> > >
> > > Amitay.
> > >
> >
> > Single queue for pending requests is not good enough.  On a loaded system
> > there can thousands of lock requests in the pending queue.  In that case,
> > per database queues definitely improves the cpu utilization for
> > scheduling.  Here are all the locking changes with additional patch for
> per
> > database queues.
> >
> > I would appreciate review and comments/suggestions.
>
> Pushed. Attached find two cosmetic patches.
>

Pushed.

>
> There's one question that I did not really fully understand:
> Why do we have lock_ctx->request as a separate entity? Is
> this legacy from when there was a queue of request and can
> be removed at some point?
>
>
The main reason for keeping the request separate from lock_ctx was to allow
callers to free request without any side effects.  The request can be
reparented using talloc_steal() as part of some state/handle and when that
structure goes away, it will automatically free up the request and perform
the proper clean up in locking code.  However, currently that's not the
case. That's why there is ctdb_lock_free_request_context().  This function
needs to go away and the callers should just talloc_free the request.

Amitay.