CTDB woes
Orlando Richards
orlando.richards at ed.ac.uk
Fri Apr 12 09:41:03 MDT 2013
On 12/04/13 16:35, Amitay Isaacs wrote:
>
> On Fri, Apr 12, 2013 at 10:39 PM, Orlando Richards
> <orlando.richards at ed.ac.uk <mailto:orlando.richards at ed.ac.uk>> wrote:
>
>
> Hi folks,
>
> We've long been using CTDB and Samba for our NAS service, servicing
> ~500 users. We've been suffering from some problems with the CTDB
> performance over the last few weeks, likely triggered either by an
> upgrade of samba from 3.5 to 3.6 (and enabling of SMB2 as a result),
> or possibly by additional users coming on with a new workload.
>
> We run CTDB 1.0.114.4-1 (from sernet) and samba3-3.6.12-44 (again,
> from sernet). Before we roll back, we'd like to make sure we can't
> fix the problem and stick with Samba 3.6 (and we don't even know
> that a roll back would fix the issue).
>
> The symptoms are a complete freeze of the service for CIFS users for
> 10-60 seconds, and on the servers a corresponding spawning of large
> numbers of CTDB processes, which seem to be created in a "big bang",
> and then do what they do and exit in the subsequent 10-60 seconds.
>
> We also serve up NFS from the same ctdb-managed frontends, and GPFS
> from the cluster - and these are both fine throughout.
>
> This was happening 5-10 times per hour, not at exact intervals
> though. When we added a third node to the CTDB cluster, it "got
> worse", and when we dropped the CTDB cluster down to a single node
> and everything started behaving fine - which is where we are now.
>
> So, I've got a bunch of questions!
>
> - does anyone know why ctdb would be spawning these processes, and
> if there's anything we can do to stop it needing to do it? Also -
> any idea how we might reproduce this kind of behaviour in a dev/test
> lab?
>
Hi Amitay,
>
> It looks like there is contention for some record(s) which results in
> CTDB creating lockwait child processes to wait for the record. I would
> suggest you try CTDB 1.2.61.
Is that the current "stable" release? I must admit to getting a bit
confused around release numbers for ctdb! The sernet release we're on
has proved to be very stable for us (it can't be said enough - thanks
Sernet!).
>
> - has anyone done any more general performance / config
> optimisation of CTDB/Samba/GPFS/Linux?
>
>
> For general performance tracking you will have to check if there is
> heavy CPU load, high memory pressure, or lots of processes in wait
> state. That will give you clues as to what the next bottleneck is.
>
From what we could see at the time, I'd have characterised it as
typical of "lots of processes in wait state", but I couldn't figure out
what they were waiting for.
Cheers,
Orlando
>
> And - more generally - does anyone else actually use ctdb/samba/gpfs
> on the scale of ~500 users or higher? If so - how do you find it?
>
>
> --
> --
> Dr Orlando Richards
> Information Services
> IT Infrastructure Division
> Unix Section
> Tel: 0131 650 4994
>
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> Amitay
--
--
Dr Orlando Richards
Information Services
IT Infrastructure Division
Unix Section
Tel: 0131 650 4994
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the samba-technical
mailing list