CTDB woes

Orlando Richards orlando.richards at ed.ac.uk
Fri Apr 12 09:41:03 MDT 2013


On 12/04/13 16:35, Amitay Isaacs wrote:
>
> On Fri, Apr 12, 2013 at 10:39 PM, Orlando Richards
> <orlando.richards at ed.ac.uk <mailto:orlando.richards at ed.ac.uk>> wrote:
>
>
>     Hi folks,
>
>     We've long been using CTDB and Samba for our NAS service, servicing
>     ~500 users. We've been suffering from some problems with the CTDB
>     performance over the last few weeks, likely triggered either by an
>     upgrade of samba from 3.5 to 3.6 (and enabling of SMB2 as a result),
>     or possibly by additional users coming on with a new workload.
>
>     We run CTDB 1.0.114.4-1 (from sernet) and samba3-3.6.12-44 (again,
>     from sernet). Before we roll back, we'd like to make sure we can't
>     fix the problem and stick with Samba 3.6 (and we don't even know
>     that a roll back would fix the issue).
>
>     The symptoms are a complete freeze of the service for CIFS users for
>     10-60 seconds, and on the servers a corresponding spawning of large
>     numbers of CTDB processes, which seem to be created in a "big bang",
>     and then do what they do and exit in the subsequent 10-60 seconds.
>
>     We also serve up NFS from the same ctdb-managed frontends, and GPFS
>     from the cluster - and these are both fine throughout.
>
>     This was happening 5-10 times per hour, not at exact intervals
>     though. When we added a third node to the CTDB cluster, it "got
>     worse", and when we dropped the CTDB cluster down to a single node
>     and everything started behaving fine - which is where we are now.
>
>     So, I've got a bunch of questions!
>
>       - does anyone know why ctdb would be spawning these processes, and
>     if there's anything we can do to stop it needing to do it? Also -
>     any idea how we might reproduce this kind of behaviour in a dev/test
>     lab?
>

Hi Amitay,


>
> It looks like there is contention for some record(s) which results in
> CTDB creating lockwait child processes to wait for the record. I would
> suggest you try CTDB 1.2.61.

Is that the current "stable" release? I must admit to getting a bit 
confused around release numbers for ctdb! The sernet release we're on 
has proved to be very stable for us (it can't be said enough - thanks 
Sernet!).


>
>       - has anyone done any more general performance / config
>     optimisation of CTDB/Samba/GPFS/Linux?
>
>
> For general performance tracking you will have to check if there is
> heavy CPU load, high memory pressure, or lots of processes in wait
> state. That will give you clues as to what the next bottleneck is.
>

 From what we could see at the time, I'd have characterised it as 
typical of "lots of processes in wait state", but I couldn't figure out 
what they were waiting for.

Cheers,
Orlando



>
>     And - more generally - does anyone else actually use ctdb/samba/gpfs
>     on the scale of ~500 users or higher? If so - how do you find it?
>
>
>     --
>                  --
>         Dr Orlando Richards
>        Information Services
>     IT Infrastructure Division
>             Unix Section
>          Tel: 0131 650 4994
>
>     The University of Edinburgh is a charitable body, registered in
>     Scotland, with registration number SC005336.
>
>
> Amitay


-- 
             --
    Dr Orlando Richards
   Information Services
IT Infrastructure Division
        Unix Section
     Tel: 0131 650 4994

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.


More information about the samba-technical mailing list