CTDB woes

Amitay Isaacs amitay at gmail.com
Fri Apr 12 18:21:22 MDT 2013


On Sat, Apr 13, 2013 at 1:41 AM, Orlando Richards <orlando.richards at ed.ac.uk
> wrote:

> On 12/04/13 16:35, Amitay Isaacs wrote:
>
>>
>> On Fri, Apr 12, 2013 at 10:39 PM, Orlando Richards
>> <orlando.richards at ed.ac.uk <mailto:orlando.richards at ed.**ac.uk<orlando.richards at ed.ac.uk>>>
>> wrote:
>>
>>
>>     Hi folks,
>>
>>     We've long been using CTDB and Samba for our NAS service, servicing
>>     ~500 users. We've been suffering from some problems with the CTDB
>>     performance over the last few weeks, likely triggered either by an
>>     upgrade of samba from 3.5 to 3.6 (and enabling of SMB2 as a result),
>>     or possibly by additional users coming on with a new workload.
>>
>>     We run CTDB 1.0.114.4-1 (from sernet) and samba3-3.6.12-44 (again,
>>     from sernet). Before we roll back, we'd like to make sure we can't
>>     fix the problem and stick with Samba 3.6 (and we don't even know
>>     that a roll back would fix the issue).
>>
>>     The symptoms are a complete freeze of the service for CIFS users for
>>     10-60 seconds, and on the servers a corresponding spawning of large
>>     numbers of CTDB processes, which seem to be created in a "big bang",
>>     and then do what they do and exit in the subsequent 10-60 seconds.
>>
>>     We also serve up NFS from the same ctdb-managed frontends, and GPFS
>>     from the cluster - and these are both fine throughout.
>>
>>     This was happening 5-10 times per hour, not at exact intervals
>>     though. When we added a third node to the CTDB cluster, it "got
>>     worse", and when we dropped the CTDB cluster down to a single node
>>     and everything started behaving fine - which is where we are now.
>>
>>     So, I've got a bunch of questions!
>>
>>       - does anyone know why ctdb would be spawning these processes, and
>>     if there's anything we can do to stop it needing to do it? Also -
>>     any idea how we might reproduce this kind of behaviour in a dev/test
>>     lab?
>>
>>
> Hi Amitay,
>
>
>
>
>> It looks like there is contention for some record(s) which results in
>> CTDB creating lockwait child processes to wait for the record. I would
>> suggest you try CTDB 1.2.61.
>>
>
> Is that the current "stable" release? I must admit to getting a bit
> confused around release numbers for ctdb! The sernet release we're on has
> proved to be very stable for us (it can't be said enough - thanks Sernet!).
>
> Yes. The current development release is 2.1.
>


>        - has anyone done any more general performance / config
>>     optimisation of CTDB/Samba/GPFS/Linux?
>>
>>
>> For general performance tracking you will have to check if there is
>> heavy CPU load, high memory pressure, or lots of processes in wait
>> state. That will give you clues as to what the next bottleneck is.
>>
>>
> From what we could see at the time, I'd have characterised it as typical
> of "lots of processes in wait state", but I couldn't figure out what they
> were waiting for.
>

Yes. Those are lockwait processes waiting for fcntl locks.


>
> Cheers,
> Orlando
>
>
>
Amitay.


More information about the samba-technical mailing list