[Samba] ctdb: Strange behaviour after upgrade

Thu Nov 18 13:44:10 MST 2010

Moin Eisofen!

eisofen at eisofen.de wrote:
> Hi,
> 
> last weekend I've updated samba and ctdb on my 2-node cluster. Samba is
> now on 3.5.6 (from 3.3.4), ctdb on 1.0.114 (from 1.0.84). Both installed
> from repo via yum and ctdb-packages.
> 
> After restarting both nodes everything was fine, we could access files on
> the cluster.
> 
> On monday I noticed that the nodes didn't had their initial adresses:
> 
> Node 1:
> hostname dscln01, public IP 10.0.0.41/8, now 10.0.0.42/8
> /etc/sysconfig/network-scripts/ifcfg-bond0:
> 
> DEVICE=bond0
> BOOTPROTO=none
> IPADDR=10.0.0.41
> NETWORK=10.0.0.0
> BROADCAST=10.0.0.255
> NETMASK=255.0.0.0
> ONBOOT=yes
> USERCTL=no
> 
> 
> 
> Node 2:
> hostname dscln02, public IP 10.0.0.42/8, now 10.0.0.41/8
> /etc/sysconfig/network-scripts/ifcfg-bond0:
> 
> DEVICE=bond0
> BOOTPROTO=none
> IPADDR=10.0.0.42
> NETWORK=10.0.0.0
> BROADCAST=10.0.0.255
> NETMASK=255.0.0.0
> ONBOOT=yes
> USERCTL=no
> 
> Yesterday it felt over so we had to reboot both nodes and the IP where
> still mixed up.

That is merely cosmetic actually.
When using public addresses with ctdb, you should not rely on a
specific node having a specific IP address.
It seems that in some release between 1.0.84 and 1.0.114
(and I do currently not know exactly when) the algorithm for
distributing ips across nodes has been reversed.
It think this has also been discussed on the #ctdb irc channel
some weeks or even months ago.

Your clients should only ever access the cluster by it's name, to
which the whole pool of public ip addresses is assigned, so it
should really not matter to which node an address is assigned.

> log.ctdb got some interesing entries after reboot:
> 
> 2010/11/17 09:48:02.613807 [ 4383]: killed 30 TCP connections to released
> IP 10.0.0.42
> 2010/11/17 09:48:02.633263 [ 4383]: re-adding secondary address
> 10.0.0.41/8 to dev bond0
> 2010/11/17 09:48:02.646140 [ 4383]: /etc/ctdb/interface_modify.sh: line
> 71: /etc/ctdb/state/interface_modify/bond0.readd.d/10.0.0.41.8/*: No such
> file or
> directory
> 2010/11/17 09:48:02.646446 [ 4383]:
> /etc/ctdb/state/interface_modify/bond0.readd.d/10.0.0.41.8/* 'bond0'
> '10.0.0.41' '8' - failed - 127
> 2010/11/17 09:48:02.646514 [ 4383]: call
> /etc/ctdb/state/interface_modify/bond0.readd.d/10.0.0.41.8/* 'bond0'
> '10.0.0.41' '8'
> 2010/11/17 09:48:02.647412 [ 4383]: Failed to del 10.0.0.42 on dev bond0
> 2010/11/17 09:48:02.649354 [ 4383]: server/ctdb_daemon.c:688 waitpid()
> returned error. errno:10

Hmmm. Did you assign the public addesses 10.0.0.41 and 10.0.0.42
to the nodes statically? This is not good. If you need static Ip
addresse on the public interfaces (e.g. for login etc), you should
use a different set of addresses.

Anyways, the above is a sign of a bug in the interface_modify.sh
script. Not sure that this is very bad though.

There is a patch in the master branch though for this and I think
It should apply to the 1.0.114 version:

http://gitweb.samba.org/?p=sahlberg/ctdb.git;a=commit;h=e665cfde03fc9ec2264e99512ed5470872a2fd04

But we need to get clear about the pool vs. static IPs first.

> I also notice, or lets say user reports, slow performance when shutting
> down their PC. When it comes to closing time load climbs to ~70 on both
> nodes. with high CPU load on ctdbd and mmfsd. OK, 220 PC writing back their
> profiles..

Has that been slow before?
Has the workload changed or just the samba+ctdb versions?
Workload of course also changes when profiles grow...

> Could ctdb the blocking element when writing to it's persistent DB, since
> the local disks are not that super fast?

Depends on what the workload really looks right, but I guess rather not.

> Both nodes are hooked up to an infortrend SAN, connected up via FC-AL, FS
> is GPFS, running on CentOS 5.3.
> Did I do something wrong after or before upgrading?

I can't say for sure.
I'd need to look at your configs (ctdb + samba).

Cheers - Michael

> Matthias
> 
> -- 
> To unsubscribe from this list go to the following URL and read the
> instructions:  https://lists.samba.org/mailman/options/samba

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 206 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba/attachments/20101118/a8f40805/attachment.pgp>