CTDB woes
Amitay Isaacs
amitay at gmail.com
Fri Apr 12 18:21:22 MDT 2013
On Sat, Apr 13, 2013 at 1:41 AM, Orlando Richards <orlando.richards at ed.ac.uk
> wrote:
> On 12/04/13 16:35, Amitay Isaacs wrote:
>
>>
>> On Fri, Apr 12, 2013 at 10:39 PM, Orlando Richards
>> <orlando.richards at ed.ac.uk <mailto:orlando.richards at ed.**ac.uk<orlando.richards at ed.ac.uk>>>
>> wrote:
>>
>>
>> Hi folks,
>>
>> We've long been using CTDB and Samba for our NAS service, servicing
>> ~500 users. We've been suffering from some problems with the CTDB
>> performance over the last few weeks, likely triggered either by an
>> upgrade of samba from 3.5 to 3.6 (and enabling of SMB2 as a result),
>> or possibly by additional users coming on with a new workload.
>>
>> We run CTDB 1.0.114.4-1 (from sernet) and samba3-3.6.12-44 (again,
>> from sernet). Before we roll back, we'd like to make sure we can't
>> fix the problem and stick with Samba 3.6 (and we don't even know
>> that a roll back would fix the issue).
>>
>> The symptoms are a complete freeze of the service for CIFS users for
>> 10-60 seconds, and on the servers a corresponding spawning of large
>> numbers of CTDB processes, which seem to be created in a "big bang",
>> and then do what they do and exit in the subsequent 10-60 seconds.
>>
>> We also serve up NFS from the same ctdb-managed frontends, and GPFS
>> from the cluster - and these are both fine throughout.
>>
>> This was happening 5-10 times per hour, not at exact intervals
>> though. When we added a third node to the CTDB cluster, it "got
>> worse", and when we dropped the CTDB cluster down to a single node
>> and everything started behaving fine - which is where we are now.
>>
>> So, I've got a bunch of questions!
>>
>> - does anyone know why ctdb would be spawning these processes, and
>> if there's anything we can do to stop it needing to do it? Also -
>> any idea how we might reproduce this kind of behaviour in a dev/test
>> lab?
>>
>>
> Hi Amitay,
>
>
>
>
>> It looks like there is contention for some record(s) which results in
>> CTDB creating lockwait child processes to wait for the record. I would
>> suggest you try CTDB 1.2.61.
>>
>
> Is that the current "stable" release? I must admit to getting a bit
> confused around release numbers for ctdb! The sernet release we're on has
> proved to be very stable for us (it can't be said enough - thanks Sernet!).
>
> Yes. The current development release is 2.1.
>
> - has anyone done any more general performance / config
>> optimisation of CTDB/Samba/GPFS/Linux?
>>
>>
>> For general performance tracking you will have to check if there is
>> heavy CPU load, high memory pressure, or lots of processes in wait
>> state. That will give you clues as to what the next bottleneck is.
>>
>>
> From what we could see at the time, I'd have characterised it as typical
> of "lots of processes in wait state", but I couldn't figure out what they
> were waiting for.
>
Yes. Those are lockwait processes waiting for fcntl locks.
>
> Cheers,
> Orlando
>
>
>
Amitay.
More information about the samba-technical
mailing list