orlando.richards at ed.ac.uk
Fri Apr 12 06:39:31 MDT 2013
We've long been using CTDB and Samba for our NAS service, servicing ~500
users. We've been suffering from some problems with the CTDB performance
over the last few weeks, likely triggered either by an upgrade of samba
from 3.5 to 3.6 (and enabling of SMB2 as a result), or possibly by
additional users coming on with a new workload.
We run CTDB 220.127.116.11-1 (from sernet) and samba3-3.6.12-44 (again, from
sernet). Before we roll back, we'd like to make sure we can't fix the
problem and stick with Samba 3.6 (and we don't even know that a roll
back would fix the issue).
The symptoms are a complete freeze of the service for CIFS users for
10-60 seconds, and on the servers a corresponding spawning of large
numbers of CTDB processes, which seem to be created in a "big bang", and
then do what they do and exit in the subsequent 10-60 seconds.
We also serve up NFS from the same ctdb-managed frontends, and GPFS from
the cluster - and these are both fine throughout.
This was happening 5-10 times per hour, not at exact intervals though.
When we added a third node to the CTDB cluster, it "got worse", and when
we dropped the CTDB cluster down to a single node and everything started
behaving fine - which is where we are now.
So, I've got a bunch of questions!
- does anyone know why ctdb would be spawning these processes, and if
there's anything we can do to stop it needing to do it? Also - any idea
how we might reproduce this kind of behaviour in a dev/test lab?
- has anyone done any more general performance / config optimisation
And - more generally - does anyone else actually use ctdb/samba/gpfs on
the scale of ~500 users or higher? If so - how do you find it?
Dr Orlando Richards
IT Infrastructure Division
Tel: 0131 650 4994
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the samba-technical