CTDB woes

Amitay Isaacs amitay at gmail.com
Fri Apr 12 09:35:31 MDT 2013


On Fri, Apr 12, 2013 at 10:39 PM, Orlando Richards <
orlando.richards at ed.ac.uk> wrote:

>
> Hi folks,
>
> We've long been using CTDB and Samba for our NAS service, servicing ~500
> users. We've been suffering from some problems with the CTDB performance
> over the last few weeks, likely triggered either by an upgrade of samba
> from 3.5 to 3.6 (and enabling of SMB2 as a result), or possibly by
> additional users coming on with a new workload.
>
> We run CTDB 1.0.114.4-1 (from sernet) and samba3-3.6.12-44 (again, from
> sernet). Before we roll back, we'd like to make sure we can't fix the
> problem and stick with Samba 3.6 (and we don't even know that a roll back
> would fix the issue).
>
> The symptoms are a complete freeze of the service for CIFS users for 10-60
> seconds, and on the servers a corresponding spawning of large numbers of
> CTDB processes, which seem to be created in a "big bang", and then do what
> they do and exit in the subsequent 10-60 seconds.
>
> We also serve up NFS from the same ctdb-managed frontends, and GPFS from
> the cluster - and these are both fine throughout.
>
> This was happening 5-10 times per hour, not at exact intervals though.
> When we added a third node to the CTDB cluster, it "got worse", and when we
> dropped the CTDB cluster down to a single node and everything started
> behaving fine - which is where we are now.
>
> So, I've got a bunch of questions!
>
>  - does anyone know why ctdb would be spawning these processes, and if
> there's anything we can do to stop it needing to do it? Also - any idea how
> we might reproduce this kind of behaviour in a dev/test lab?
>

It looks like there is contention for some record(s) which results in CTDB
creating lockwait child processes to wait for the record. I would suggest
you try CTDB 1.2.61.


>  - has anyone done any more general performance / config optimisation of
> CTDB/Samba/GPFS/Linux?
>

For general performance tracking you will have to check if there is heavy
CPU load, high memory pressure, or lots of processes in wait state. That
will give you clues as to what the next bottleneck is.


>
> And - more generally - does anyone else actually use ctdb/samba/gpfs on
> the scale of ~500 users or higher? If so - how do you find it?
>
>
> --
>             --
>    Dr Orlando Richards
>   Information Services
> IT Infrastructure Division
>        Unix Section
>     Tel: 0131 650 4994
>
> The University of Edinburgh is a charitable body, registered in Scotland,
> with registration number SC005336.
>
>
Amitay


More information about the samba-technical mailing list