CTDB complains about net serverid and Samba doesn't bind to public addresses

Michael Adam obnox at samba.org
Tue Sep 7 00:58:10 MDT 2010


Hi Alexander,

Alexander wrote:
> Hi Samba Team,
> 
> I'm trying to setup a simple test cluster with CTDB. The OS is
> SLES10SP3, Samba is 3.5.4, installed with RPMs for SLES10 from
> enterprisesamba.com. This is on two VMware Server 2.0.2 VMs with 1Gb
> RAM each.
> 
> I've tried to pull CTDB sources both using rsync and git pull ways
> listed in Wiki and CTDB main page, they don't seem to differ.

What version/repository/branch did you use?
The current official ctdb repository is Ronnie Sahlberg's repo:
git://git.samba.org/sahlberg/ctdb.git

> Looks like there's no "net serverid" command in 3.5.4 and CTDB's
> events.d/50.samba tries to call it.

Uh, seems like it has not been ported back to 3.5.
Maybe you could file a bug for this one?

> Second problem is that while CTDB assigns proper public IPs to the
> interface, Samba doesn't bind to them (when started without CTDB it
> does).

Could you paste your smb.conf?
In order for samba to work with ctdb's public ip distribution
mechanism, you need to configure samba to listen on the wildcard
address, i.e. not use "interfaces = " or "bind interfaces only = yes"
in your samba config.

> And the tird one is that it sometimes crashes almost right after
> start, the log snippet is below.

I need to look into that. Coming back to you later...

Cheers - Michael


> I'm using no lockfile at the moment - just to make things easier at
> the beginning and ensure it can start at all.
> 
> Could anyone please take a look and suggest something?
> 
> =======
> public_addresses:
> 10.125.136.56/24 eth0
> 10.125.136.57/24 eth0
> =======
> nodes:
> 192.168.10.128
> 192.168.10.129
> =======
> ip addr show when ctdb is running:
> 1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
>     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>     inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
>     inet 127.0.0.2/8 brd 127.255.255.255 scope host secondary lo
> 2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
>     link/ether 00:0c:29:c6:7f:8f brd ff:ff:ff:ff:ff:ff
>     inet 10.125.136.21/24 brd 10.125.136.255 scope global eth0
>     inet 10.125.136.56/24 brd 10.125.136.255 scope global secondary eth0
>     inet 10.125.136.57/24 brd 10.125.136.255 scope global secondary eth0
> 3: eth1: <BROADCAST,MULTICAST,NOTRAILERS,UP> mtu 1500 qdisc pfifo_fast qlen 1000
>     link/ether 00:0c:29:c6:7f:99 brd ff:ff:ff:ff:ff:ff
>     inet 192.168.10.129/24 brd 192.168.10.255 scope global eth1
> =======
> I have the following in CTDB log (default ERR verbosity) when it does run:
> 
> 2010/09/02 17:09:15.229176 [23205]: Recovery lock file set to "".
> Disabling recovery lock checking
> 2010/09/02 17:09:15.382620 [23208]: Starting CTDBD as pid : 23208
> 2010/09/02 17:09:15.965742 [23208]: Freeze priority 1
> 2010/09/02 17:09:15.966016 [23208]: Freeze priority 2
> 2010/09/02 17:09:15.966166 [23208]: Freeze priority 3
> 2010/09/02 17:09:18.972098 [recoverd:23271]: Trigger takeoverrun
> 2010/09/02 17:09:18.973317 [23208]: Freeze priority 1
> 2010/09/02 17:09:18.973908 [23208]: Freeze priority 2
> 2010/09/02 17:09:18.974167 [23208]: Freeze priority 3
> 2010/09/02 17:09:19.657790 [23208]: Thawing priority 1
> 2010/09/02 17:09:19.658025 [23208]: Release freeze handler for prio 1
> 2010/09/02 17:09:19.658210 [23208]: Thawing priority 2
> 2010/09/02 17:09:19.658267 [23208]: Release freeze handler for prio 2
> 2010/09/02 17:09:19.658333 [23208]: Thawing priority 3
> 2010/09/02 17:09:19.658379 [23208]: Release freeze handler for prio 3
> 2010/09/02 17:09:21.001663 [recoverd:23271]: Resetting ban count to 0
> for all nodes
> 2010/09/02 17:09:34.104006 [23208]: 2010/09/02 17:09:34.103349
> [23534]: Database 'config.tdb' does not exist
> 2010/09/02 17:09:34.852080 [23208]: Invalid command: net serverid
> 2010/09/02 17:09:34.853514 [23208]: Usage:
> 2010/09/02 17:09:34.853633 [23208]: net rpc             Run functions
> using RPC transport
> 2010/09/02 17:09:34.853687 [23208]: net rap             Run functions
> using RAP transport
> 2010/09/02 17:09:34.853689 [23208]: net ads             Run functions
> using ADS transport
> 2010/09/02 17:09:34.853691 [23208]: net file            Functions on
> remote opened files
> 2010/09/02 17:09:34.853693 [23208]: net share           Functions on shares
> 2010/09/02 17:09:34.853694 [23208]: net session         Manage sessions
> 2010/09/02 17:09:34.853978 [23208]: net server          List servers
> in workgroup
> 2010/09/02 17:09:34.854017 [23208]: net domain          List
> domains/workgroups on network
> 2010/09/02 17:09:34.854054 [23208]: net printq          Modify printer queue
> 2010/09/02 17:09:34.854089 [23208]: net user            Manage users
> 2010/09/02 17:09:34.854123 [23208]: net group           Manage groups
> 2010/09/02 17:09:34.854158 [23208]: net groupmap        Manage group mappings
> 2010/09/02 17:09:34.854192 [23208]: net sam             Functions on
> the SAM database
> 2010/09/02 17:09:34.854245 [23208]: net validate        Validate
> username and password
> 2010/09/02 17:09:34.854280 [23208]: net groupmember     Modify group memberships
> 2010/09/02 17:09:34.854315 [23208]: net admin           Execute remote
> command on a remote OS/2 server
> 2010/09/02 17:09:34.854391 [23208]: net service         List/modify
> running services
> 2010/09/02 17:09:34.854429 [23208]: net password        Change user
> password on target server
> 2010/09/02 17:09:34.854469 [23208]: net changetrustpw   Change the
> trust password
> 2010/09/02 17:09:34.854505 [23208]: net changesecretpw  Change the
> secret password
> 2010/09/02 17:09:34.854544 [23208]: net setauthuser     Set the
> winbind auth user
> 2010/09/02 17:09:34.854623 [23208]: net getauthuser     Get the
> winbind auth user settings
> 2010/09/02 17:09:34.854773 [23208]: net time            Show/set time
> 2010/09/02 17:09:34.854807 [23208]: net lookup          Look up host
> names/IP addresses
> 2010/09/02 17:09:34.854840 [23208]: net g_lock          Manipulate the
> global lock table
> 2010/09/02 17:09:34.854874 [23208]: net join            Join a domain/AD
> 2010/09/02 17:09:34.854908 [23208]: net dom             Join/unjoin
> (remote) machines to/from a domain/AD
> 2010/09/02 17:09:34.854943 [23208]: net cache           Operate on the
> cache tdb file
> 2010/09/02 17:09:34.855049 [23208]: net getlocalsid     Get the SID
> for the local domain
> 2010/09/02 17:09:34.855085 [23208]: net setlocalsid     Set the SID
> for the local domain
> 2010/09/02 17:09:34.855131 [23208]: net setdomainsid    Set domain SID
> on member servers
> 2010/09/02 17:09:34.855166 [23208]: net getdomainsid    Get domain SID
> on member servers
> 2010/09/02 17:09:34.855200 [23208]: net maxrid          Display the
> maximul RID currently used
> 2010/09/02 17:09:34.855234 [23208]: net idmap           IDmap functions
> 2010/09/02 17:09:34.855269 [23208]: net status          Display server status
> 2010/09/02 17:09:34.855275 [23208]: net usershare       Manage
> user-modifiable shares
> 2010/09/02 17:09:34.855280 [23208]: net usersidlist     Display list
> of all users with SID
> 2010/09/02 17:09:34.855284 [23208]: net conf            Manage Samba
> registry based configuration
> 2010/09/02 17:09:34.855288 [23208]: net registry        Manage the
> Samba registry
> 2010/09/02 17:09:34.855437 [23208]: net eventlog        Process Win32
> *.evt eventlog files
> 2010/09/02 17:09:34.855472 [23208]: net help            Print usage information
> 2010/09/02 17:09:34.970420 [23208]: Unable to allocate transport
> packet for operation 7 of length 1852731295
> 2010/09/02 17:09:34.970491 [23208]: Out of memory for c at
> server/ctdb_control.c:788
> 2010/09/02 17:09:34.970529 [23208]: ctdb error: Out of memory at
> server/ctdb_control.c:788
> 2010/09/02 17:09:34.970563 [23208]: server/ctdb_daemon.c:1029 Failed
> to send control to remote node 1
> 2010/09/02 17:09:34.983807 [23208]: Starting SAMBA nmbd :..done
> 2010/09/02 17:09:35.020839 [recoverd:23271]: Trigger takeoverrun
> 2010/09/02 17:09:35.104193 [23208]: Unable to allocate transport
> packet for operation 7 of length 1919116719
> 2010/09/02 17:09:35.104373 [23208]: Out of memory for c at
> server/ctdb_control.c:788
> 2010/09/02 17:09:35.104417 [23208]: ctdb error: Out of memory at
> server/ctdb_control.c:788
> 2010/09/02 17:09:35.104452 [23208]: server/ctdb_daemon.c:1029 Failed
> to send control to remote node 1
> 2010/09/02 17:09:35.199770 [23208]: Starting SAMBA smbd :..done
> 2010/09/02 17:09:37.503404 [recoverd:23271]: Trigger takeoverrun
> 2010/09/02 17:09:43.321626 [23208]: ERROR: samba tcp port 445 is not responding
> 2010/09/02 17:09:49.109093 [23208]: ERROR: samba tcp port 445 is not responding
> 2010/09/02 17:09:59.850374 [23208]: ERROR: samba tcp port 445 is not responding
> <the last line keeps coming>
> =======
> And the following when it crashes:
> 
> 2010/09/02 16:47:12.633279 [20763]: Recovery lock file set to "".
> Disabling recovery lock checking
> 2010/09/02 16:47:12.770414 [20765]: Starting CTDBD as pid : 20765
> 2010/09/02 16:47:13.457077 [20765]: Freeze priority 1
> 2010/09/02 16:47:13.457496 [20765]: Freeze priority 2
> 2010/09/02 16:47:13.457680 [20765]: Freeze priority 3
> 2010/09/02 16:47:16.461269 [recoverd:20828]: Trigger takeoverrun
> 2010/09/02 16:47:16.462253 [20765]: Freeze priority 1
> 2010/09/02 16:47:16.462483 [20765]: Freeze priority 2
> 2010/09/02 16:47:16.462609 [20765]: Freeze priority 3
> 2010/09/02 16:47:17.191078 [20765]: Thawing priority 1
> 2010/09/02 16:47:17.191259 [20765]: Release freeze handler for prio 1
> 2010/09/02 16:47:17.191586 [20765]: Thawing priority 2
> 2010/09/02 16:47:17.191589 [20765]: Release freeze handler for prio 2
> 2010/09/02 16:47:17.191793 [20765]: Thawing priority 3
> 2010/09/02 16:47:17.191841 [20765]: Release freeze handler for prio 3
> 2010/09/02 16:47:18.572036 [recoverd:20828]: Resetting ban count to 0
> for all nodes
> 2010/09/02 16:47:32.563885 [20765]: 2010/09/02 16:47:32.563316
> [21109]: Database 'config.tdb' does not exist
> 2010/09/02 16:47:33.449435 [20765]: Unable to allocate transport
> packet for operation 7 of length 3086770620
> 2010/09/02 16:47:33.449570 [20765]: Out of memory for c at
> server/ctdb_control.c:788
> 2010/09/02 16:47:33.449609 [20765]: ctdb error: Out of memory at
> server/ctdb_control.c:788
> 2010/09/02 16:47:33.449644 [20765]: server/ctdb_daemon.c:1029 Failed
> to send control to remote node 1
> 2010/09/02 16:47:33.554365 [20765]: Starting SAMBA nmbd :..done
> 2010/09/02 16:47:33.595716 [recoverd:20828]: Trigger takeoverrun
> 2010/09/02 16:47:33.676324 [20765]:
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> 2010/09/02 16:47:33.676459 [20765]: INTERNAL ERROR: Signal 11 in ctdbd
> pid 207652010/09/02 16:47:33.676494 [20765]:
> Please read the file BUGS.txt in the distribution
> 2010/09/02 16:47:33.676526 [20765]:
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> 2010/09/02 16:47:33.676559 [20765]: PANIC: internal error
> 2010/09/02 16:47:33.677444 [20765]: BACKTRACE: 19 stack frames:
>  #0 /usr/sbin/ctdbd [0x8092e87]
>  #1 /usr/sbin/ctdbd [0x8093147]
>  #2 /usr/sbin/ctdbd [0x8093267]
>  #3 /usr/sbin/ctdbd [0x809329b]
>  #4 [0xffffe420]
>  #5 /usr/sbin/ctdbd [0x804e3d7]
>  #6 /usr/sbin/ctdbd [0x804ccfe]
>  #7 /usr/sbin/ctdbd [0x804cebc]
>  #8 /usr/sbin/ctdbd [0x808af04]
>  #9 /usr/sbin/ctdbd [0x808b561]
>  #10 /usr/sbin/ctdbd [0x80a7c3b]
>  #11 /usr/sbin/ctdbd [0x80a8236]
>  #12 /usr/sbin/ctdbd [0x80a4584]
>  #13 /usr/sbin/ctdbd [0x80a478b]
>  #14 /usr/sbin/ctdbd [0x80a483d]
>  #15 /usr/sbin/ctdbd [0x804dbe8]
>  #16 /usr/sbin/ctdbd [0x804b91f]
>  #17 /lib/libc.so.6(__libc_start_main+0xdc) [0xb7eb189c]
>  #18 /usr/sbin/ctdbd [0x804a5f1]
> 2010/09/02 16:47:33.735159 [recoverd:20828]: recovery daemon parent
> died - exiting
> =======
> 
> cheers,
> Alexander
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 206 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20100907/c1c49eba/attachment.pgp>


More information about the samba-technical mailing list