CTDB complains about "net serverid" and Samba doesn't bind to public IPs

Alexander forsmbg at googlemail.com
Thu Sep 2 07:55:09 MDT 2010


Hi Samba Team,

I'm trying to setup a simple test cluster with CTDB. The OS is
SLES10SP3, Samba is 3.5.4, installed with RPMs for SLES10 from
enterprisesamba.com. This is on two VMware Server 2.0.2 VMs with 1Gb
RAM each.

I've tried to pull CTDB sources both using rsync and git pull ways
listed in Wiki and CTDB main page, they don't seem to differ.

Looks like there's no "net serverid" command in 3.5.4 and CTDB's
events.d/50.samba tries to call it.
Second problem is that while CTDB assigns proper public IPs to the
interface, Samba doesn't bind to them (when started without CTDB it
does).
And the tird one is that it sometimes crashes almost right after
start, the log snippet is below.

I'm using no lockfile at the moment - just to make things easier at
the beginning and ensure it can start at all.

Could anyone please take a look and suggest something?

=======
public_addresses:
10.125.136.56/24 eth0
10.125.136.57/24 eth0
=======
nodes:
192.168.10.128
192.168.10.129
=======
ip addr show when ctdb is running:
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet 127.0.0.2/8 brd 127.255.255.255 scope host secondary lo
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:c6:7f:8f brd ff:ff:ff:ff:ff:ff
    inet 10.125.136.21/24 brd 10.125.136.255 scope global eth0
    inet 10.125.136.56/24 brd 10.125.136.255 scope global secondary eth0
    inet 10.125.136.57/24 brd 10.125.136.255 scope global secondary eth0
3: eth1: <BROADCAST,MULTICAST,NOTRAILERS,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0c:29:c6:7f:99 brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.129/24 brd 192.168.10.255 scope global eth1
=======
I have the following in CTDB log (default ERR verbosity) when it does run:

2010/09/02 17:09:15.229176 [23205]: Recovery lock file set to "".
Disabling recovery lock checking
2010/09/02 17:09:15.382620 [23208]: Starting CTDBD as pid : 23208
2010/09/02 17:09:15.965742 [23208]: Freeze priority 1
2010/09/02 17:09:15.966016 [23208]: Freeze priority 2
2010/09/02 17:09:15.966166 [23208]: Freeze priority 3
2010/09/02 17:09:18.972098 [recoverd:23271]: Trigger takeoverrun
2010/09/02 17:09:18.973317 [23208]: Freeze priority 1
2010/09/02 17:09:18.973908 [23208]: Freeze priority 2
2010/09/02 17:09:18.974167 [23208]: Freeze priority 3
2010/09/02 17:09:19.657790 [23208]: Thawing priority 1
2010/09/02 17:09:19.658025 [23208]: Release freeze handler for prio 1
2010/09/02 17:09:19.658210 [23208]: Thawing priority 2
2010/09/02 17:09:19.658267 [23208]: Release freeze handler for prio 2
2010/09/02 17:09:19.658333 [23208]: Thawing priority 3
2010/09/02 17:09:19.658379 [23208]: Release freeze handler for prio 3
2010/09/02 17:09:21.001663 [recoverd:23271]: Resetting ban count to 0
for all nodes
2010/09/02 17:09:34.104006 [23208]: 2010/09/02 17:09:34.103349
[23534]: Database 'config.tdb' does not exist
2010/09/02 17:09:34.852080 [23208]: Invalid command: net serverid
2010/09/02 17:09:34.853514 [23208]: Usage:
2010/09/02 17:09:34.853633 [23208]: net rpc             Run functions
using RPC transport
2010/09/02 17:09:34.853687 [23208]: net rap             Run functions
using RAP transport
2010/09/02 17:09:34.853689 [23208]: net ads             Run functions
using ADS transport
2010/09/02 17:09:34.853691 [23208]: net file            Functions on
remote opened files
2010/09/02 17:09:34.853693 [23208]: net share           Functions on shares
2010/09/02 17:09:34.853694 [23208]: net session         Manage sessions
2010/09/02 17:09:34.853978 [23208]: net server          List servers
in workgroup
2010/09/02 17:09:34.854017 [23208]: net domain          List
domains/workgroups on network
2010/09/02 17:09:34.854054 [23208]: net printq          Modify printer queue
2010/09/02 17:09:34.854089 [23208]: net user            Manage users
2010/09/02 17:09:34.854123 [23208]: net group           Manage groups
2010/09/02 17:09:34.854158 [23208]: net groupmap        Manage group mappings
2010/09/02 17:09:34.854192 [23208]: net sam             Functions on
the SAM database
2010/09/02 17:09:34.854245 [23208]: net validate        Validate
username and password
2010/09/02 17:09:34.854280 [23208]: net groupmember     Modify group memberships
2010/09/02 17:09:34.854315 [23208]: net admin           Execute remote
command on a remote OS/2 server
2010/09/02 17:09:34.854391 [23208]: net service         List/modify
running services
2010/09/02 17:09:34.854429 [23208]: net password        Change user
password on target server
2010/09/02 17:09:34.854469 [23208]: net changetrustpw   Change the
trust password
2010/09/02 17:09:34.854505 [23208]: net changesecretpw  Change the
secret password
2010/09/02 17:09:34.854544 [23208]: net setauthuser     Set the
winbind auth user
2010/09/02 17:09:34.854623 [23208]: net getauthuser     Get the
winbind auth user settings
2010/09/02 17:09:34.854773 [23208]: net time            Show/set time
2010/09/02 17:09:34.854807 [23208]: net lookup          Look up host
names/IP addresses
2010/09/02 17:09:34.854840 [23208]: net g_lock          Manipulate the
global lock table
2010/09/02 17:09:34.854874 [23208]: net join            Join a domain/AD
2010/09/02 17:09:34.854908 [23208]: net dom             Join/unjoin
(remote) machines to/from a domain/AD
2010/09/02 17:09:34.854943 [23208]: net cache           Operate on the
cache tdb file
2010/09/02 17:09:34.855049 [23208]: net getlocalsid     Get the SID
for the local domain
2010/09/02 17:09:34.855085 [23208]: net setlocalsid     Set the SID
for the local domain
2010/09/02 17:09:34.855131 [23208]: net setdomainsid    Set domain SID
on member servers
2010/09/02 17:09:34.855166 [23208]: net getdomainsid    Get domain SID
on member servers
2010/09/02 17:09:34.855200 [23208]: net maxrid          Display the
maximul RID currently used
2010/09/02 17:09:34.855234 [23208]: net idmap           IDmap functions
2010/09/02 17:09:34.855269 [23208]: net status          Display server status
2010/09/02 17:09:34.855275 [23208]: net usershare       Manage
user-modifiable shares
2010/09/02 17:09:34.855280 [23208]: net usersidlist     Display list
of all users with SID
2010/09/02 17:09:34.855284 [23208]: net conf            Manage Samba
registry based configuration
2010/09/02 17:09:34.855288 [23208]: net registry        Manage the
Samba registry
2010/09/02 17:09:34.855437 [23208]: net eventlog        Process Win32
*.evt eventlog files
2010/09/02 17:09:34.855472 [23208]: net help            Print usage information
2010/09/02 17:09:34.970420 [23208]: Unable to allocate transport
packet for operation 7 of length 1852731295
2010/09/02 17:09:34.970491 [23208]: Out of memory for c at
server/ctdb_control.c:788
2010/09/02 17:09:34.970529 [23208]: ctdb error: Out of memory at
server/ctdb_control.c:788
2010/09/02 17:09:34.970563 [23208]: server/ctdb_daemon.c:1029 Failed
to send control to remote node 1
2010/09/02 17:09:34.983807 [23208]: Starting SAMBA nmbd :..done
2010/09/02 17:09:35.020839 [recoverd:23271]: Trigger takeoverrun
2010/09/02 17:09:35.104193 [23208]: Unable to allocate transport
packet for operation 7 of length 1919116719
2010/09/02 17:09:35.104373 [23208]: Out of memory for c at
server/ctdb_control.c:788
2010/09/02 17:09:35.104417 [23208]: ctdb error: Out of memory at
server/ctdb_control.c:788
2010/09/02 17:09:35.104452 [23208]: server/ctdb_daemon.c:1029 Failed
to send control to remote node 1
2010/09/02 17:09:35.199770 [23208]: Starting SAMBA smbd :..done
2010/09/02 17:09:37.503404 [recoverd:23271]: Trigger takeoverrun
2010/09/02 17:09:43.321626 [23208]: ERROR: samba tcp port 445 is not responding
2010/09/02 17:09:49.109093 [23208]: ERROR: samba tcp port 445 is not responding
2010/09/02 17:09:59.850374 [23208]: ERROR: samba tcp port 445 is not responding
<the last line keeps coming>
=======
And the following when it crashes:

2010/09/02 16:47:12.633279 [20763]: Recovery lock file set to "".
Disabling recovery lock checking
2010/09/02 16:47:12.770414 [20765]: Starting CTDBD as pid : 20765
2010/09/02 16:47:13.457077 [20765]: Freeze priority 1
2010/09/02 16:47:13.457496 [20765]: Freeze priority 2
2010/09/02 16:47:13.457680 [20765]: Freeze priority 3
2010/09/02 16:47:16.461269 [recoverd:20828]: Trigger takeoverrun
2010/09/02 16:47:16.462253 [20765]: Freeze priority 1
2010/09/02 16:47:16.462483 [20765]: Freeze priority 2
2010/09/02 16:47:16.462609 [20765]: Freeze priority 3
2010/09/02 16:47:17.191078 [20765]: Thawing priority 1
2010/09/02 16:47:17.191259 [20765]: Release freeze handler for prio 1
2010/09/02 16:47:17.191586 [20765]: Thawing priority 2
2010/09/02 16:47:17.191589 [20765]: Release freeze handler for prio 2
2010/09/02 16:47:17.191793 [20765]: Thawing priority 3
2010/09/02 16:47:17.191841 [20765]: Release freeze handler for prio 3
2010/09/02 16:47:18.572036 [recoverd:20828]: Resetting ban count to 0
for all nodes
2010/09/02 16:47:32.563885 [20765]: 2010/09/02 16:47:32.563316
[21109]: Database 'config.tdb' does not exist
2010/09/02 16:47:33.449435 [20765]: Unable to allocate transport
packet for operation 7 of length 3086770620
2010/09/02 16:47:33.449570 [20765]: Out of memory for c at
server/ctdb_control.c:788
2010/09/02 16:47:33.449609 [20765]: ctdb error: Out of memory at
server/ctdb_control.c:788
2010/09/02 16:47:33.449644 [20765]: server/ctdb_daemon.c:1029 Failed
to send control to remote node 1
2010/09/02 16:47:33.554365 [20765]: Starting SAMBA nmbd :..done
2010/09/02 16:47:33.595716 [recoverd:20828]: Trigger takeoverrun
2010/09/02 16:47:33.676324 [20765]:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
2010/09/02 16:47:33.676459 [20765]: INTERNAL ERROR: Signal 11 in ctdbd
pid 207652010/09/02 16:47:33.676494 [20765]:
Please read the file BUGS.txt in the distribution
2010/09/02 16:47:33.676526 [20765]:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
2010/09/02 16:47:33.676559 [20765]: PANIC: internal error
2010/09/02 16:47:33.677444 [20765]: BACKTRACE: 19 stack frames:
 #0 /usr/sbin/ctdbd [0x8092e87]
 #1 /usr/sbin/ctdbd [0x8093147]
 #2 /usr/sbin/ctdbd [0x8093267]
 #3 /usr/sbin/ctdbd [0x809329b]
 #4 [0xffffe420]
 #5 /usr/sbin/ctdbd [0x804e3d7]
 #6 /usr/sbin/ctdbd [0x804ccfe]
 #7 /usr/sbin/ctdbd [0x804cebc]
 #8 /usr/sbin/ctdbd [0x808af04]
 #9 /usr/sbin/ctdbd [0x808b561]
 #10 /usr/sbin/ctdbd [0x80a7c3b]
 #11 /usr/sbin/ctdbd [0x80a8236]
 #12 /usr/sbin/ctdbd [0x80a4584]
 #13 /usr/sbin/ctdbd [0x80a478b]
 #14 /usr/sbin/ctdbd [0x80a483d]
 #15 /usr/sbin/ctdbd [0x804dbe8]
 #16 /usr/sbin/ctdbd [0x804b91f]
 #17 /lib/libc.so.6(__libc_start_main+0xdc) [0xb7eb189c]
 #18 /usr/sbin/ctdbd [0x804a5f1]
2010/09/02 16:47:33.735159 [recoverd:20828]: recovery daemon parent
died - exiting
=======

cheers,
Alexander


More information about the samba-technical mailing list