[Samba] ctdb: Strange behaviour after upgrade

Michael Adam obnox at samba.org
Thu Nov 18 13:44:10 MST 2010

Moin Eisofen!

eisofen at eisofen.de wrote:
> Hi,
> last weekend I've updated samba and ctdb on my 2-node cluster. Samba is
> now on 3.5.6 (from 3.3.4), ctdb on 1.0.114 (from 1.0.84). Both installed
> from repo via yum and ctdb-packages.
> After restarting both nodes everything was fine, we could access files on
> the cluster.
> On monday I noticed that the nodes didn't had their initial adresses:
> Node 1:
> hostname dscln01, public IP, now
> /etc/sysconfig/network-scripts/ifcfg-bond0:
> DEVICE=bond0
> ONBOOT=yes
> Node 2:
> hostname dscln02, public IP, now
> /etc/sysconfig/network-scripts/ifcfg-bond0:
> DEVICE=bond0
> ONBOOT=yes
> Yesterday it felt over so we had to reboot both nodes and the IP where
> still mixed up.

That is merely cosmetic actually.
When using public addresses with ctdb, you should not rely on a
specific node having a specific IP address.
It seems that in some release between 1.0.84 and 1.0.114
(and I do currently not know exactly when) the algorithm for
distributing ips across nodes has been reversed.
It think this has also been discussed on the #ctdb irc channel
some weeks or even months ago.

Your clients should only ever access the cluster by it's name, to
which the whole pool of public ip addresses is assigned, so it
should really not matter to which node an address is assigned.

> log.ctdb got some interesing entries after reboot:
> 2010/11/17 09:48:02.613807 [ 4383]: killed 30 TCP connections to released
> IP
> 2010/11/17 09:48:02.633263 [ 4383]: re-adding secondary address
> to dev bond0
> 2010/11/17 09:48:02.646140 [ 4383]: /etc/ctdb/interface_modify.sh: line
> 71: /etc/ctdb/state/interface_modify/bond0.readd.d/*: No such
> file or
> directory
> 2010/11/17 09:48:02.646446 [ 4383]:
> /etc/ctdb/state/interface_modify/bond0.readd.d/* 'bond0'
> '' '8' - failed - 127
> 2010/11/17 09:48:02.646514 [ 4383]: call
> /etc/ctdb/state/interface_modify/bond0.readd.d/* 'bond0'
> '' '8'
> 2010/11/17 09:48:02.647412 [ 4383]: Failed to del on dev bond0
> 2010/11/17 09:48:02.649354 [ 4383]: server/ctdb_daemon.c:688 waitpid()
> returned error. errno:10

Hmmm. Did you assign the public addesses and
to the nodes statically? This is not good. If you need static Ip
addresse on the public interfaces (e.g. for login etc), you should
use a different set of addresses.

Anyways, the above is a sign of a bug in the interface_modify.sh
script. Not sure that this is very bad though.

There is a patch in the master branch though for this and I think
It should apply to the 1.0.114 version:


But we need to get clear about the pool vs. static IPs first.

> I also notice, or lets say user reports, slow performance when shutting
> down their PC. When it comes to closing time load climbs to ~70 on both
> nodes. with high CPU load on ctdbd and mmfsd. OK, 220 PC writing back their
> profiles..

Has that been slow before?
Has the workload changed or just the samba+ctdb versions?
Workload of course also changes when profiles grow...

> Could ctdb the blocking element when writing to it's persistent DB, since
> the local disks are not that super fast?

Depends on what the workload really looks right, but I guess rather not.

> Both nodes are hooked up to an infortrend SAN, connected up via FC-AL, FS
> is GPFS, running on CentOS 5.3.
> Did I do something wrong after or before upgrading?

I can't say for sure.
I'd need to look at your configs (ctdb + samba).

Cheers - Michael

> Matthias
> -- 
> To unsubscribe from this list go to the following URL and read the
> instructions:  https://lists.samba.org/mailman/options/samba

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 206 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba/attachments/20101118/a8f40805/attachment.pgp>

More information about the samba mailing list