orlando.richards at ed.ac.uk
Fri Apr 12 09:41:03 MDT 2013
On 12/04/13 16:35, Amitay Isaacs wrote:
> On Fri, Apr 12, 2013 at 10:39 PM, Orlando Richards
> <orlando.richards at ed.ac.uk <mailto:orlando.richards at ed.ac.uk>> wrote:
> Hi folks,
> We've long been using CTDB and Samba for our NAS service, servicing
> ~500 users. We've been suffering from some problems with the CTDB
> performance over the last few weeks, likely triggered either by an
> upgrade of samba from 3.5 to 3.6 (and enabling of SMB2 as a result),
> or possibly by additional users coming on with a new workload.
> We run CTDB 18.104.22.168-1 (from sernet) and samba3-3.6.12-44 (again,
> from sernet). Before we roll back, we'd like to make sure we can't
> fix the problem and stick with Samba 3.6 (and we don't even know
> that a roll back would fix the issue).
> The symptoms are a complete freeze of the service for CIFS users for
> 10-60 seconds, and on the servers a corresponding spawning of large
> numbers of CTDB processes, which seem to be created in a "big bang",
> and then do what they do and exit in the subsequent 10-60 seconds.
> We also serve up NFS from the same ctdb-managed frontends, and GPFS
> from the cluster - and these are both fine throughout.
> This was happening 5-10 times per hour, not at exact intervals
> though. When we added a third node to the CTDB cluster, it "got
> worse", and when we dropped the CTDB cluster down to a single node
> and everything started behaving fine - which is where we are now.
> So, I've got a bunch of questions!
> - does anyone know why ctdb would be spawning these processes, and
> if there's anything we can do to stop it needing to do it? Also -
> any idea how we might reproduce this kind of behaviour in a dev/test
> It looks like there is contention for some record(s) which results in
> CTDB creating lockwait child processes to wait for the record. I would
> suggest you try CTDB 1.2.61.
Is that the current "stable" release? I must admit to getting a bit
confused around release numbers for ctdb! The sernet release we're on
has proved to be very stable for us (it can't be said enough - thanks
> - has anyone done any more general performance / config
> optimisation of CTDB/Samba/GPFS/Linux?
> For general performance tracking you will have to check if there is
> heavy CPU load, high memory pressure, or lots of processes in wait
> state. That will give you clues as to what the next bottleneck is.
From what we could see at the time, I'd have characterised it as
typical of "lots of processes in wait state", but I couldn't figure out
what they were waiting for.
> And - more generally - does anyone else actually use ctdb/samba/gpfs
> on the scale of ~500 users or higher? If so - how do you find it?
> Dr Orlando Richards
> Information Services
> IT Infrastructure Division
> Unix Section
> Tel: 0131 650 4994
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
Dr Orlando Richards
IT Infrastructure Division
Tel: 0131 650 4994
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the samba-technical