[SCM] CTDB repository - annotated tag ctdb-1.0.57 created -
ctdb-1.0.57
Ronnie Sahlberg
sahlberg at samba.org
Sun Aug 24 23:21:06 GMT 2008
The annotated tag, ctdb-1.0.57 has been created
at 5da71c0e493474cd88b92c14b73f9286d58f7817 (tag)
tagging 7da0c65c8526d66d4f2a788bd646d39237befa54 (commit)
tagged by Ronnie Sahlberg
on Mon Aug 25 09:14:28 2008 +1000
- Log -----------------------------------------------------------------
Tag for the 1.0.57 release
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQBIseto2aJ36aon/y8RAreBAJ4qxL2cKZB7/O3Ed/84eeSs0MB+DACeLcPU
L/IfdzGtiPT2YrVPK/U//Gs=
=8rQF
-----END PGP SIGNATURE-----
Alexander Bokovoy (12):
Fix popt handling, this fixes segfault while asking for --help
Merge from tridge
Ignore configure
Ignore configure
Fix ctdb_call() fetching data and ltdb backend flags
rely on ctdb_test.c for the moment
change test a bit -- work on whole array of ints and process it in the loop (locally)
regenerate configure after changing configure.ac
Merge from tridge
Provide an alternative CTDB_NO_MEMORY_NULL() for functions which return a pointer
Fix memory handling
Fix cleaning targets to delete proper files
Andrew Tridgell (857):
initial version
right include file path
added a README
get rid of some .svn files
added ctdb_set_address and broke out parsing
added incoming setup
stub for ctdb_call
added ignore
added event context
added a test event loop
don't talk to ourselves
example node list
merge from ab
started splitting out transport code
this file is not needed yet
- split up tcp functions into more logical parts
- setup a convenience name field for nodes
- added simple (fake) vnn system
- added in idtree for efficient reqid handling
merge parts of the changes from ab. Don't merge ctdb_test changes yet
added ctdb_connect_wait()
need the header changes too
- fixed the sort function to include the exsting data
- added a 1 node test
- added ctdb_set_flags() call
queue up packets to nodes that aren't connected yet. This avoids a
merge from ab
merged from Peter
merge IB code from peter
merge from peter
added storage of extended ltdb header information
next step towards dmaster/lmaster code
added error reply packets
added redirect handling
expanded some comments
added logic for keeping track of the lacount
added request_dmaster and reply_dmaster logic
added a 4 node test
wrap the packet queue call
added handling of partial packet reads
enforce the tcp memory alignment in packet queue
simple ctdb benchmark
merge from ab
merge from Peter
merged from peter
merge fixes from samba4
added copies of libs so can be built standalone
merge db wrap code from samba4
use ctdb_call_info, so struct ctdb_call can be used for top level call
simplified ctdb_call() interface, and made it easier to expand with more parameters later
fix a bug in new structure handling
merge status code changes from samba4 ctdb
merged from samba4 ctdb
merged ctdb messaging code from samba4
ignore config.h*
merged ib work from peter
changed ctdb_bench.c to use messages instead of calls
added a simple benchmark script
merged peters IB work
fixed incr initialisation
support hostnames for node names
added a benchmark script that launches via ssh
ignored some files
added rest of tdb (missed in earlier commit)
added --num-msgs option
merged changes from peter
merge multi-database support from ronnie
merge back some changes from Samba4
merge fetch code from ronnie, and add a simple fetch test
merge from ronnies branch
merge from ronnie
added a magic header for wireshark and packet version info
put test code in tests/ directory
fixed a bunch of memory leaks
remove talloc debug code
another memory leak
fix configure for new test location
fix a possible free after use
merge from ronnie
added daemon mode to ctdb_bench
merge tcp changes from ronnie
made all sockets handle partial IO
merge from ronnie
make some functions static, and remove an unused structure
fix the queueing for partially connected tcp sockets
- add --daemon flag to ctdb_fetch test code
forgot to add ctdb_client.c
merge from ronnie
partially completed work towards full messaging system which will work in both daemon and standalone mode. Does not compile\! committing so ronnie can continue while I'm out
merge from ronnie, plus complete the client side of inter-node messaging
fixed sending messages to ourselves in non-daemon mode
add proper support for ctdb_connect_wait in daemon mode
use the new connect_wait code in the ctdb_messaging test
added --num-clients option to ctdb_messaging test
merge from ronnie
merge from volker
use lib/replace for signal.h
use event_loop_wait instead of while(1)
merge from ronnie
merge from vl
merge from ronnie
merge from ronnie
merge store_unlock code from ronnie
private -> private_data for samba3
merge from ronnie
merge from ronnie
added --dblist option to ctdbd, to allow list of databases to be specified on the command line
- fix includes to work in both samba4 and ctdb standalone
merge CTDB_SRVID_ALL patch from Samba4
pull out common command line code for tests into tests/cmdline.c
merged from samba4
this is a demonstration of an idea for handling locks in ctdb.
fixed a fd bug (thanks volker)
merge local copy of tdb from samba4 tdb
added a tdb_chainlock_nonblock() call to tdb
added a ctdb_ltdb_lock_fetch_requeue() function
wait on the right fd ....
darn, forgot this
fixed crash bug - thanks volker
add an explanation of how to use ctdb_lockwait()
add an explanation of ctdb_ltdb_lock_fetch_requeue()
better error handling in ctdb_ltdb_lock_fetch_requeue()
partial merge from volker (some overlaps removed)
merge from ronnie
merge from ronnie
merge from ronnie
- removed the non-daemon mode from ctdb, in order to simplify the
tidyups in test code
now that both daemon and client access the database, it needs to be a real disk file
block SIGPIPE in the daemon to prevent a SIGPIPE on write to a dead socket
make sure we unlock
- send the record header from the client to the daemon when doing a
we should not lock in a normal ctdb_call(), as we want them to run concurrently
when we get a lmaster request, skip updating the header when we are also the new dmaster
start using ctdb_ltdb_lock_fetch_requeue()
stop the client looping (temporary measure)
update destination in a redirect reply
merge from volker and ronnie
make sure we notify ctdb when a node dies
fixed a missing idr remove, and check the types after idr_find()
use the common cmdline code in ctdbd
moved cmdline.c to common code
more DEBUG() calls
add debug tracing to fetch_lock
- merge volkers debug changes
bit less verbose when client exits
started adding a cleaner daemon finish method
merge fetch1 changes from ronnie
- merge from ronnie, and use wait instead of sleep in test scripts
make sure we don't double free in the async lock handler
use shutdown in more tests
simpler shutdown process. The reply is not actually needed, and
validate dmaster on a client fetch request
merged cleanup from ronnie
this fixes a timeout error spotted by volker
merged the db_dir changes from volker. Changed them slightly,
- use separate directories for the tdb files by default
avoid a deadlock the fetch_lock code. The deadlock could happen when
- make he packet allocation routines take a mem_ctx, which allows
- fully separate the client version of ctdb_call from the daemon
much simpler fetch code!
don't need these structures any more
merge from ronnie
merge from ronnie
- added a --torture option to all ctdb tools. This sets
- split out ctdb_ltdb_lock_fetch_requeue() into a simpler
fixed a bug found by volker - initialse the record on disk when initialised in memory
minor debug changes
merged fix from volker (thanks!)
- fixed a problem with packets to ourselves. The packets were being
added ctdb_status tool
- expanded status to include count of each call type
merge tdb updates from samba4
merge fixes from samba4
- prevent sending dmaster requests to ourselves
merge from samba4
update the vnn as well when getting the connection information
added a useful tool for dumping a ctdb
- when handling a record migration in the lmaster, bypass the usual
debug changes
mark authoritative records
fixed the reverse of the last bug - handle the case when the new dmaster is the lmaster
added max_redirect_count status field
popt not needed in lockwait code
fit some more windows across a screen
add version printout
merge from ronnie
added a ctdb control message, and tool
moved status to ctdb_control
merge from peter
fixed typo
validate the vnn
ignore generated nodes.txt
merge from peter
removed some bogus debug lines
added a ctdb_get_config call
got rid of the getdbpath call
null terminate a string
merge vnn_map code from ronnie
merge from ronnie
nicer testing of control data size
debug level controls
merged broadcast messages from ronnie
some debug code
merge from ronnie
always use allocated packets to avoid alignment errors
added install target
merge from ronnie
factor out the packet allocation code
added make test and make valgrindtest targets
fixed some warnings
allow ctdbd_allocate_pkt to be used in client code
merged from ronnie
report number of clients in ping
use rsync to avoid text busy on install
added status all and debug all control operations
use ctdb_get_connected_nodes for node listing
better name for this hack
much simpler redirect logic
removed unnecessary variable
added reset status control
yay! finally fixed the bug that volker, ronnie and I have been chasing
saner logfile code
don't use stderr here - rely on logging
changed the way set_call and attach are done so that you can safely
added attach command in ctdb_control
auto-determine listen address by attempting to bind to each address in the cluster in turn
added a hopcount in ctdb_call
fixed a lib/events bug found by volker
merge latest versions of lib/replace, lib/talloc, lib/tdb and lib/events into ctdb bzr tree
new files for updated events system
merged from ronnie
nicer string handling in usage
nicer command parsing in ctdb_control
added a builtin fetch function to support samba3 unlocked fetch
enabled built in popt if system doesn't have it
fixed a memory leak in the ctdb_control code
merge from ronnie
merge from ronnie
merged cleanup from ronnie
first stage of efficient non-blocking ctdb traverse
don't zero beyond packet header unnecessarily
merged from ronnie
first version of traverse is working
merge from ronnie
- changed the REQ_REGISTER PDU to be a control
nicer interface to ctdb traverse
make catdb take a dbname instead of an id
added a tdb_enable_seqnum() function
added a ctdb control for enabling the tdb seqnum
added seqnum propogation code to ctdb
merged from ronnie
- added counters for controls in ctdb_control status
- fixed a crash bug after client disconnect in ctdb_control
added a dumpmemory control, used to find memory leaks
show number of connected clients in status output
added tdb_chainlock_mark() call, which can be used to mark a chain locked without actually locking it. This will be used to guarantee forward progress in the ctdb non-blocking lockwait code
- added a EVENT_FD_AUTOCLOSE flag that allows you to tell the event system to close the fd automatically when a fd_event is freed. This prevents races which can lead to epoll missing events
use the new lib/events autoconf code
allow the events system to be chosen on the command line
- take advantage of the new EVENT_FD_AUTOCLOSE flag
merged vnn map broadcast from ronnie
merge relevant lib code from samba4
merged from ronnie
fixed a problem with the number of timed events growing without bound with the new seqnum code
merge from ronnie
merged ronnies code to delay client requests when in recovery mode
moved the vnn_map initialisation out of the cmdline code
separate the wire format and internal format for the vnn_map
fixed setvnnmap to use wire structures too
remove old s3 recovery code
setup the random number generator a bit better
merge from ronnie
better timeout handling for calls, controls and traverses
added nonblocking varients of the two lockall functions to tdb
- got rid of the complex hand marshalling in the recovery controls
- merge from ronnie
fixed debug message
added _mark calls for tdb_lockall
added lockwait child code for entering recovery mode. A child processes holds lockall locks for the entire recovery process
separate out the freeze/thaw handling from recovery
more robust freeze/thaw logic
watch for the freeze child exiting
report number of frozen/thawed nodes
show total frozen/recoving in status
- nicer message if freeze child dies
added -t option to ctdb_control
make sure we ignore requeued ctdb_call packets of older generations except for packets from the client
simplify the generation checking on incoming call packets
ensure we propogate the correct rsn for a request dmaster
the retry client code is no longer needed now that we use a freeze on recovery
the invalid dmaster is no longer needed in recovery
prioritise the dmaster in case of matching rsn
added error messages in ctdb_control replies
make sure the ctdb control socket is secure
don't allow setrecmaster while not frozen
don't allow setvnnmap while not frozen
kill the lockwait child if the pipe goes away
we must not free the fde until after we no longer need the lock child
AIX needs sin_len field for bind()
reading on the write side of a pipe isn't allowed - this caused us to run without locking in the lockwait code
added a -i switch to run ctdbd without forking
check for error on ctdb_ltdb_store
added a control to get the local vnn
fixed a fd close error on reconnect
fixed two more places where we don't correctly handle write errors on sockets
moved the recovery daemon into the main ctdbd and enable it by default
enable TCP keepalives
- merge from ronnie
merge shutdown control from ronnie
merged events changes from samba4
merged debug changes from samba4
merge from ronnie
removed the CTDB_CTRL_FLAG_NOREQUEUE flag
merged from samba4
- don't try to send controls to dead nodes
merge from samba4
merge from samba4
merge keepalive code from ronnie
- up rx_cnt on all packet types
timeout pending controls immediately when a node becomes disconnected
a better way to resend calls after recovery
merge tx_cnt code from ronnie
make sure we don't increment rx_cnt for redirected packets, or for packets that have been requeued after a lockwait
nicer date formatting
don't count packets received from before the transport told us the node was dead
removed obsolete ctdb_dump tool
merge from ronnie
show ctdb control timeout
global lock should imply the transaction lock
start ctdb frozen, and let the election sort things out. This prevents a race on startup
- startup frozen, and do an initial recovery
merge from ronnie
- get rid of ctdb_ctrl_get_config
added automatic vacuuming of empty records during recovery
fixed some memory leaks on the traverse code
fixed %d which should be %u
merge from ronnie
raise the control timeout in recovery
make ctdbd realtime if possible
merge from ronnie
added IP takeover logic for public IPs to ctdb
new files for IP takeover
paraoid check for empty db on attach
consider a node dead after 6 seconds, not 15
keep sending ARPs for 2 minutes, every 5 seconds
make sure we find out about new nodes as fast as possible
send a message to clients when an IP has been released
send the message from daemon context
moved system specific ip code to system.c
handle corrupt ctdb packets better
paranoid checks for bad packets in tcp layer. Close the socket if it gets a bad packet
show op type of badly aligned packets in tcp layer
drop any partialialy send packets when we get a socket write error
removed bogus alignment check
tweak timeouts
added function to send a raw tcp ack packet
added code to ctdb to send a tcp 'tickle' ack when we takeover an
send on the right socket!
fixed tcp data offset and checksum
remove experimental code
use a window size that is obvious in sniffs
fixed error reporting in tickle ack code
automatic cleanup of tcp tickle records
when handing over an IP to another node, also tell them of any tcp connections we were handling, so they can send tickle acks for those connections
another place where we could send a partial packet
merged packaging from jim
rename ctdb_control utility to ctdb
- renamed ctdb_control utility to ctdb
fix sense of inet_aton() call
- moved ctdbd specific options to ctdbd.c from cmdline.c
call the event script on recovery too
added an example ctdb event script
made events script executable
fixed syntax of /sbin/ip
don't need maskbits to ip addr del
clean shutdown in ctdb - release all our IPs
fixed some debug messages
fixed more warnings on 64 bit boxes
merge from jim
use /etc/services for ctdb
use autoconf for more paths
default log file to reasonable location
update packaging for new defaults
more build tweaks
- make more options configurable
- ignore blank lines at end of lists
fixed shell syntax in events script
fixed broadcast controls from the command line
fixed system() return handling
don't block SIGCHLD, or we lose return values from system() !
flush any local arp entries for the given ip on add/del
samba3 needs ctdb_private.h installed to build
auto-restart NFS if its running when we release an IP
moved onnode into ctdb from s3 examples/ctdb
support ctdb status -n all
fixed onnode symlink install
wait for local tcp services like smbd to come up before allowing ctdb to start talking to other nodes
- nice messages while waiting for tcp services to come up
don't start the transport connecting to the other nodes until after the startup event script has run
- use a CTDB_BROADCAST_ALL for the attach message so it goes to currently disconnected nodes
we need to listen at transport initialise stage to find our own node number
close sockets when we exec scripts
use our own netmask when deciding if we should takeover a IP, not the other nodes
tell newly connected nodes about any tcp tickle records that we have that they don't have
merge lib/replace from samba4
added hooks to make nfs statd behave correctly on failover
better location for statd-callout
if there is no node available to take an IP, don't consider that an error
ctdb is GPL not LGPL
merged from ronnie
added CTDB_WAIT_DIRECTORIES support
log dates/time in event startup messages
merge initial web site from ronnie
added package download
better download instructions
convert ctdbd.sh tests to use an event script
make the running of the takeover and release event scripts async, to prevent outages due to slow scripts
use a subdirectory for ctdb state files
split out events for each subsystem separately
tidy up the install somewhat
- make symlink relative in install
make the packaging much more portable - tested on SLES9 and RHEL4
don't strictly need netcat
added nfs event script
put nfs events in spec and Makefile.in
- make calling of recovered event script async
disable realtime scheduler in event scripts
another place we need to cope with the strange epoll fork semantics
- moved cmdline options that are only relevant to ctdbd into ctdbd.c
- make specification of a recovery lock file compulsory
first step towards fixing "make test" with the new daemon system
make test now works again
merge from ronnie
add an easy way to setup ctdb to start/stop samba
ctdb_test.c is gone
removed some old cruft
move config files to config/ directory
docs on how to use statd-callout
fixed a race condition in the handling of the recovery lock
do a full restart in init cron call
test commit
don't start nfs services unless the relevant directories are available
merge from ronnie
web page tidy ups
doc updates
fixed location of init.d directory to work on SLES and RHEL
more portability tweaks in the init script
merged from ronnie
make the init scripts more portable about location of system config files
handle NETWORKING var not existing
merged from ronnie
automatically bring up interfaces that we manage. This allows ctdb to work without requiring two IPs per public interface
split out the basic interface handling, and run event scripts in a deterministic order
make sure we don't have any namespace collision problems with config variables
remove some cruft thats not needed any more
- start moving tunable variables into their own structure
added tunables settable using ctdb command line tool
allow setting of variables at startup in config file
make recovery daemon values tunable
merge from ronnie
don't crash doing ctdb ip when not doing takeover
use the right IP from the passed structure in takeip/releaseip calls
ignore commented out entries in /etc/exports
explain event types
use the right IP from the passed structure in takeip/releaseip calls
remove an unused function
more unused code
set close on exec on pipe in event scripts, so long running scripts don't hold the pipe
first step in health monitoring of cluster nodes. When not healthy they will be marked disabled
clean out some more cruft
added health monitoring logic to ctdb, so a node loses its public IP address if one of the sybsystem event scripts reports a problem
merged vsftpd event script from ronnie
added 40.vsftpd to Makefile.in
fixed exit code in makerpms.sh
send the right sort of message on monitoring failure
- fixed flags display in logs
fixed error handling in event scripts
- added monitoring of rpc ports for nfs, and of Samba ports and directories
increase release number of ctdb
added timeouts in all event scripts
handle the case of all nodes being sick for one service
disable a node if testparm thinks there is a error, or warning, or an unrecognised option
ensure all nodes display disabled nodes correctly
update flags in parent daemon too
get parents idea of recmode and recmaster when deciding if we should do a takeover run
formatting fix for wider variable names
merged admin enable/disable change from ronnie
implement a scheme where nodes are banned if they continuously caused the cluster
increase rpm release number
added admin commands to ban/unban nodes
handle CTDB_CURRENT_NODE in ban commands
get all the tunables at once in recovery daemon
there are now far too many controls for the controls statistics fields to be useful
validate vnn on node flags change
use a priority time for the election data, not just the vnn
formatting fixes
choose the most connected node first
later times are a lower priority, not a higher priority
start splitting the code into separate client and server pieces
more code rearrangement
some #include cleanups
move more util code to lib/util
update configure.ac for new code layout
remove the test commit
merge from ronnie
new web page layout
web tweaks
web tweaks
added logo
crop logo
convert rest of pages to new format
balance the layout
more web tweaks
merge from ronnie
added documentation page
new logo, fixed links
doc updates
install man page
fixed manaul install dir for rpms
merge from ronnie
newer versions of ip need the mask on del
support up takeover in testing when root
ignore arp on loopback
propogate flag changes to all connected nodes
- send tcp info to all connected nodes, not just vnnmap nodes
fixed valgrind error
merge from ronnie
merge from ronnie
use gzip --rsyncable for ctdb packages
layout copyright using a literal
fixed testparm calls
more detail in recovery message
simpler handling of -n all in ctdb tool
raise the default keepalive limit
make sure we start the freeze process quickly on all nodes when we are going to do recovery - this prevents serialisation of freeze, which can take a long time
- tidied up some of the web page text
minor doc updates
make the web site pass the w3c validator
fixed rendering in IE
added onnode manual page
move all the headers into header.html
make the pages scale a bit better
on startup release all IPs, in case we have any left over from a previous run
- wait for winbind on samba start
check winbind in monitoring event too
- merged ctdb_store test from ronnie
merge from ronnie
run smbstatus every 10 minutes to scrub databases
added code to kill registered clients on a IP release
script version of install needs spaces after -m
merge from ronnie
merge from ronnie
merge from ronnie
merge from ronnie
more careful checking of lengths
- neaten up the command line for killtcp
removed unused makefile var
forgot to add this
merge from ronnie
fixed error message on bad IP/port
fixed help layout
merge from ronnie (with spelling fixes)
merge from ronnie - we have an official port number, yay!
increment rpm release number
log the generation numbers to give a hint about this bug
we do tell banned nodes to release IPs
call kill_clients when releasing all IPs, as well as for individual IPs
fixed sense of inet_aton test
merge from ronnie
update lib/replace from samba4
update lib/tdb from samba4
update lib/events from samba4 (If->if)
more merges for GPLv3 update
minor back-merge from samba4
added --nosetsched option to ctdbd
allow extra option override in /etc/sysconfig/ctdb
fixed the sense of do_setsched
fully save/restore scheduler parameters
- merge from ronnie
ensure killtcp structure is initialised
merged from ronnie
make sure we still run events when waiting for ctdb_event_script()
- log registering of tcp clients
up release number
make timed_event structure private to events_timed.c
merge from ronnie
merge from ronnie
merge changes needed for samba4
merge from ronnie
merged new event script calling code from ronnnie
removed redundent debug message
merge from volker
merge from volker
merge from ronnie
fixed segv when no public interface is set
merge from ronnie
merge from ronnie
up the release number
added a diagnostics tool for ctdb
add crontab and sysctl output
merge from ronnie
add back in --public-interface as a default
- use struct sockaddr_in more consistently instead of string addresses
added back --public-interface to startup script
fixed a pointer cast warning
get interface right
fixed location of arp_filter
changed some debug levels
- don't allow the registration of clients with IPs we don't hold
- set arp_ignore to prevent replying to arp requests for addresses on loopback
handle hung or slow ctdb daemons on shutdown
fixed return code
remove clutter from ctdb log file
new approach for killing TCP connections on IP release
remove more cruft from the logs
we don't need the is_loopback logic in ctdb any more
fixed script errors in 10.interface
force recovery if unable to tell a node to release an IP
more shell scripting fixes in 10.interface
prevent recursion in the calling of ctdb_takeover_run
ensure smbd and winbindd do die in 50.samba
nicer use of testparm
wait for ctdbd to finish cleanup before considering "service ctdb stop" to be done
- merge from ronnie
make sure all public IPs are removed at startup
fix pkill args
cope with non-standard install dirs in event scripts
merge from ronnie
increase release number
expanded ctdb_diagnostics a bit
separate out the various fs display ops
make sure we set close on exec on any possibly inherited fds
added support for persistent databases in ctdbd
merge from ronnie
merge bugfix from ronnie
avoid using connected nodes that aren't in the vnn map yet
make the persistent dbdir configurable
fixed a valgrind error, and some warnings
no longer wait at startup for services to become available, instead
run monitoring more quickly when unhealthy and at startup
fixed a fd leak on the recovery lock
merge from ronnie
upped version number
we are the culprit if we can't get the reclock
- catch ESTALE in the recovery lock by trying a read()
fixed several places where we set the recovery culprit incorrectly
make sure reconnected nodes start off as unhealthy so they don't get a public IP
add config option for disabling bans
disable optimisation for now, until we find a occasional segv
merge from ronnie
sync flags between nodes in monitor loop in recmaster
disable ipmux code until we have a configure test
improved handling of systems without libipq.h
only link to -lipq if needed
more detail on multipath config
increase release number
merge from ronnie
merge from ronnie
added some debug lines to help track down a problem
remove a incorrectly added file
merge from ronnie
increase release number
merge from ronnie
prevent a double free
fixed a valgrind uninitialised memory error due to pad bytes
another place where we need to mark connect_fde as freed
fixed a double close of a socket, leading to an EPOLL error
fixed a problem with backgrounding onnnode
merge from ronnie
update release number
added monitoring of ftp ports
merge from ronnie
added bonding info to ctdb_diagnostics
increase release number
patch from michael adam
prevent a deadly embrace between smbd and ctdbd by moving the calling
don't do the first startup event until we are out of recovery
make election handling much more scalable
make it easier to test starting large numbers of virtual nodes
need public_addresses for test suite
- merge from ronnie
increase release number
merge from ronnie
make DeterministicIPs the default
merge from ronnie
update release number and changelog
fixed segv on failed ctdb_ctrl_getnodemap
updated release info
fixed order of changelog
quick fix for timeout in recovery
make this a custom build
make this a custom build
more optimisations to recovery
add randrec to Makefile
added ctdb_randrec test tool
prevent a re-ban loop for single node clusters
prevent O(n^2) behaviour for traverse after large numbers of deletes
make sure vars are set at startup before recovery
fixed a warning
update revnumber for custom tree
expand tdb by minimum of 25% at a time
fixed a warning
added async pull, push and rsn handling functions
make some specific cases of the non-dmaster bug non-fatal
avoid write locks during delete checks in traversals
this fixes the non-dmaster bug that has plagued us for months
convert much of the recovery logic to be async and parallel across all nodes
a useful hack for checking correct behaviour of recovery
a new tunable DatabaseMaxDead that enables the tdb max dead cache logic
update version
ensure we always build the right version
added tdb_wipe_all() function
cleanup the new freelist code
fixed the bug that make "onnode N service ctdb start" hang
fixed data offset definition
fixed excludes in tar ball creation for src rpm
Rewrote the tdb transaction code to be O(N) instead of O(N^2)
convert tdb from u32 to uint32_t to match the current Samba trees
update from Samba4
merge from Samba4
this is needed with merged tdb
- added tdb_add_flags() and tdb_remove_flags()
ensure tdb log messages appear in ctdbd logs
non-persistent databases don't need sync transactions
change default tunables to cope with larger dbs
new simpler and much faster recovery code based on tdb transactions
added paranoid transaction ids
new rpm version
don't retstart statd when we don't need to
more efficient traversal in pulldb control
catch internal traversal errors
nicer onnode output
merge from ronnie
background the smbstatus -n command
show start/stop time of recovery on all nodes
updated docs from ronnie
added two new ctdb commands:
ensure the recovery daemon is not clagged up by vacuum calls
ensure the main daemon doesn't use a blocking lock on the freelist
this is not an error - it just means the record was busy
nicer outut from repack and vacuum
changed default vacuum limit
increase version number
forgot this file
needs to be in Makefile.in too
only match vacuum list if on the same database
allow remote variable expansion in onnode, so you can use wildcards that expand on the remote nodes
allow delete percentage to be specified on the command line
a compromise for freelist scanning - we now will look for other than the first fit, but get exponentially more desperate as we get deeper into the freelist
report the store rate in ctdb_randrec
add a max runtime switch to ctdb tool
tdb_freelist_size was reporting 1 more than correct size
block alarm signals during critical sections of vacuum
auto-run the vacuum and repack ops every 5 minutes by default
increase version number
exponential backoff in health monitoring for faster startup
get rid of monitor_retry as well
fixed the bug that caused tdbtorture to fail
merge from ronnie
minor fix to transaction_write_existing
fixed a memory leak in the recovery daemon
- catch a case where the client disconnects during a call
update for release
merge from ronnie
added syslog support, and use a pipe to catch logging from child processes to the ctdbd logging functions
the event scripts no longer need to show a date, as its done by the main ctdbd logging function
The recovery daemon does not need to be a realtime task
fixed two 64bit warnings
update for release
fixed handling of \r from stdout of subprocesses
cope better with large debug dumps
merge from ronnie
more efficient freelist allocation
merge from samba4
merged 60.nfs changes from ronnie
make ctdb dumpmemory work remotely, and dump the talloc
fixed egrep pattern to use more compatible expression for spaces
partial merge from ronnie
added an ignore file
useful for building with equivalent options to the spec file
merge async recovery changes from Ronnie
ignore some autogenerated test files
update download instructions for new git tree
update for release 1.0.25
removed dependence on dprintf
fixed a crash bug in the new transaction code
added debug constants to allow for better mapping to syslog levels
merge from ronnie
nicer use of structures and use isalpha()
fixed a problem with tdb growing after each recovery
don't ship the .git directory in the srpm
carefully step around the recovery area when doing a tdb_wipe_all. This prevents
- accept an optional set of tdb_flags from clients on open a database,
fixed permissions on configure.rpm
Merge branch 'master' of git://git.samba.org/sahlberg/ctdb
Merge commit 'sofs1/tridge'
fixed realloc bug
use git archive to create tarball
need to specicy tree to git archive
Merge commit 'ronnie-ctdb/master' into tridge
Fix the chicken and egg problem with ctdb/samba and a registry smb.conf
put the return in the right place
zero out the ctdb->freeze_handle when we free it
added option to start ctdb under valgrind
CTDB_NO_MEMORY_VOID() needs to return on error
fixed some incorrect CTDB_NO_MEMORY*() calls found after fixing the
fixed a warning
prevent valgrind errors where we print unitialised values on control errors
don't use mmap in tdb if --nosetsched is set. That makes valgrind
ensure pad bytes in the ltdb_header are initialised
an extraordinarily ugly patch!
fixed a case statement
Merge commit 'ronnie/master'
fixed up exit status for onnode
fixed postun script to prevent corrupting RPM database
Merge commit 'ronnie/master'
Merge commit 'ronnie/master'
Merge commit 'ronnie/master'
fixed a bug where we would look for a signal past the end of the
fixed buffering in ctdb logging code to handle multiple lines
allow for probing of directories without raising an error
run the testparm commands in 50.samba in the background, only running
- show pids during test
- cleanup persistent db at start
cleanup on SIGINT
rename the structure we use for marshalling multiple records
added a new persistent transaction test program
added new multi-record transaction commit code
added client side functions for new transaction code
we don't need ctdb_ltdb_persistent_store() any more
added marshalling helper functions
new prototypes
make sure we honor the TDB_NOSYNC flag from clients in the server
renamed the pulldb structure to a ctdb_marshall_buffer
cleanup of the old persistent db test
fixed a warning
fixed some warnings
ensure we use killtcp on non-NFS/non-CIFS ports for faster failover of
we need an additional gratuitous arp before the NFS tickles
implemented replayable transactions in ctdb to prevent deadlock
ensure we use killtcp on non-NFS/non-CIFS ports for faster failover of
we need an additional gratuitous arp before the NFS tickles
cover some corner cases where the persistent database could become
fixed a looping error bug with the new transactions code
Merge commit 'ronnie/1.0.53'
return a more detailed error code from a trans2 commit error
up release number
save writing the same data twice
imported failure handling from dbwrap_ctdb.c
added a new control CTDB_CONTROL_TRANS2_COMMIT_RETRY so we can tell
added retry handling in client
fixed send of release IP message
fixed a memory leak in the recovery daemon
Merge commit 'ronnie/master'
up release version
fixed merge
Martin Schwenke (5):
Complete rewrite of tools/onnode. Remove old tools/onnode.ssh,
Update Makefile.in for new version of onnode.
When in verbose mode with -p, each line is prefixed with the node
Yip yip yip!
Signed-off-by: Martin Schwenke <martin at meltin.net>
Peter Somogyi (58):
Added infiniband transport implementation(incomplete) and interface.
Just testing the bzr e-mail plugin...
Implementing basic data structure handling...
Using samba DLIST helper macro set.
Testing e-mail notification...
bzr email plugin test
Added some event handling (incomplete)...
Implemented cm usage.
Rough implementation of buffer handling.
Raw implementation done.
Using struct <type> instead of typedefs.
Made ibwrapper compilable.
Raw impl. of ibwrapper test tool.
Made infiniband support configurable.
Added checks for ib libs and headers
Modified send logic to allow large messages.
Made receiver handle partial packets.
Added send queue.
Added trace messages + several fixes
Joining ctdb and ibwrapper (incomplete).
+1 ibw function +1 bugfix
bugfix in ibw_send
ibw: modified tridge's code - in my point of view
1st "working" ib version.
Merged tridge's branch.
Fixed a side effect of previous revert.
Adjusted ib test tool #1.
Adjusted debug level and test scenario.
Added overload test scenario + fixed 1 send queue bug.
merging tridge's code...
2 bugfixes
Added variable msg size scenario.
ib: fragment sent buf + many bugfixes
merged tridge's code
ib: adjustment of a test scenario
Merged tridge's code.
Some minor changes before integrating ib...
ib: a trivial approach of integration
1st working ib integrated ctdb
merged tridge's fix
workaround proposal for the initialization-problem
merged tridge's code
ib: added external send queue to workaround downtime
ib: test scenario was wrong
merged tridge's branch
Merged tridge's ctdb branch.
- ctdb/ib minor bugfixes (error case)
Simplified code in ctdb_init_transport.
merged tridge's branch
removing my dirt from tridge's code
use talloc_vasprintf
fixed ctdb/ib bug at reject event
made ofed-1.0 (and 1.1) compatible + fixed warnings
merged tridge's branch
merged tridge's branch
ctdb/ib: reduce debug output; allow not only ip
ctdb/ib: swapped ibwrapper_tets options (-d, -a, -g)
fixed prev. ibwrapper_test options
Ronnie Sahlberg (785):
merge from tridge
merge from tridges tree
merge from tridge
merge from tridge
merge from tridge
merge from tridge
add a comment that sometimes sending remote calls straight to the
split the 32bit idr field into two.
add pdu's that the client can use to query the ctdb daemon of the path
add a mapping table from a hash value to a lmaster vnn number
merge from tridge
add a control to read the vnnmap configuration from a node
add a new control : SETVNNMAP to set the generation id and also the vnn
add a special VNN that means "all" nodes so that a message can be
merge from tridge
make srvid 64 bits instead of 32 bits
add a generation field to the pdu header.
clients should not fill in "generation" nor be aware of what generation
ctdb will now verify that the generation id for all CTDB_REQ_CALLs that
merge from tridge
you cant dereference ctdb->vnnmap in the client since it is null in hte
add a control to pull the database list from a remote node
add a few more controls that are useful for debugging a cluster
print vnn as decimal instead of hex
merge with tridge
add a control to read an entire tdb from a node including
add a new control to set all records in a database to a new dmaster
control to delete all records in a database
implement a control to pull a database from a remote node
add a new "recovery mode" field to ctdb.
merge from tridge
add push/pull of tdb and a control to copy a tdb from one node to
add an initial recovery control to perform samba3 style recovery
fix a bug in pushdb control.
specify which node to perform recovery to when using the recovery
merge with tridges tree to resolve all conflicts
update some calls to ctdb_control() that were still using the old
change the getnodemap control to a more consistent output for whether a
remove test code in the fetch test to keep the daemons running forever
add a control to create a database
remove sleep from the fetch test
discard REQ/REPLY DMASTER when generation id is wrong or when in
add a recover test change alignment for the pull/push db structures
recover.sh test script that build a few database and populates them with
do a real recovery by killing a node and then calling the recover
merge from tridge
merge from tridge
merge from tridge
break set/get vnn map out from ctdb_control and put it in ctdb_recover.c
fixup getdbmap control so it looks a bit nicer
cleanup getnodemap
merge from tridge
cleanup the control "write record"
merge from tridge
start working on a recovery daemon
ctdb_control should use the provided timeout and not hardcode to 1.0
change the signature for ctdb_ctrl_getnodemap() so that a timeout
update getvnnmap control to take a timeout parameter
merge from tridge
also verify that the generation id is the same on all the nodes and if
recovery daemon
remove a exit from the test script
merge from tridge
merge from tridge
merge from tridge
split the vnn broadcast address into two
merge from tridge
add a control to get the pid of a daemon.
merge from tridge
add support in catdb to dump the content of a specific nodes tdb instead
in the recover test
update to rhe recovery daemon
hte timed_out variable needs to be static and can not be on the stack
add a ifdeffed out block to the call.
dont use arrays where a uint32_t works just as well
dont allocate arrays where we can just return a single integer
break out the setting/clearing of recovery mode into a dedicated helper
add a helper function to create all missing remote databases detected
create a helper function to make sure the local node that does recovery
create a helper function for recovery that pulls and merges all remote
break the code that repoints dmaster for all local and remote records
add an extra blank line
create a helper function for recovery to push all local databases out
break out the code to update all nodes to the new vnnmap into a helper
change a lot of printf into debug statements
update a comment to be more desciptive
add a test in the function that checks whether the cluster needs
add new controls to get and set the recovery master node of a daemon
recovery daemon with recovery master election
it now works to talloc_free() the timed event if we no longer want it to
hang the timeout event off state and thus we dont need to explicitely
merge from tridge
fix alignment bug for pulldb
we must repoint dmaster to an invalid node during recovery to stop the
add a small tool to monitor recovery
change the name of the recovery daemon to ctdb_recoverd
add a command line flag to ctdbd to start a recovery daemon.
when we are in recovery mode and we get a REQ_CALL from a client,
hang the event from the retry structure instead of the hdr structure
merge from tridge
actually check the remote nodes and not just the local node
merge from tridge
when starting recovery repoint dmaster to an invalid node and not the
when starting a new election, also force all nodes into recovery mode so
update ctdb_control to create a correct ctdb_vnn_map->map array
create a correct vnnmap structure to prevent a segv
merge from tridge
make ctdb_control catdb work again
we must bump the rsn everytime we do a REQ_DMASTER or a REPLY_DMASTER
add a control to bump the rsn number for all records in a database
merge from tridge
we have to get a NEW generation id after completing recovery
merge from tridge
add a mising parameter to the new signature for ctdb_control
remove the control to bump the rsn since we dont need it anymore
merge from tridge
merge from tridge
merge from tridge
merge from tridge
merge from tridge
if a caller specifies a timeout when calling a control, it makes no
remove a prototype we no longer need
merge from tridge
add a control to shutdown/kill a node
we no longer pass lmaster across during pulldb so dont print it from
merge from tridge
add dead node detection so that if a node does not generate any
add a missing file :-)
merge from tridge
add a node->tx_cnt counter
increase the tx_cnt everytime we send a packet to a node
use ctdb_dead_node() instead of reimplementing the same code again
merge from tridge
add controls to enable/disable the monitoring of dead nodes
merge from tridge
add a new command for ctdb_control to trigger a recovery
merge from tridge
add controls to take over and release an ip address
new branch from tridges tree
- create /etc/ctdb/taken_ips and /etc/ctdb/changed_ips analog to the
it is -f not -x to check if a file exists
initial webpage
add a developers section
STATD_SHARED_DIRECTORY should be define din the nfs sysconfig file and
update the evens scripts for nfs and nfslock to honour CTDB_MANAGES_NFS
fix broken link to the CTDB setup page
merge from tridge
when we get a dmaster error, show the database id in the log so we can
merged with tridge
mention that ctdb offers cross cluster messaging to applications
merge from tridge
merge from tridge
print an error message to stdout if we failed to open the logfile for
ubuntu uses a different style of init scripts than redhat and suse
add a -Y option to generate machine readable output.
merge from tridge
merge from tridge
add the ip address to the nodemap structure we pull from a server and
show the second column in the machinereadable output for ctdb status as
merge from tridge
change the takoverip/releaseip controls to pass a structure containing
merge from tridge
add a control that lists all public ip addresses and which node that
merge from tridge
dont use CTDB_MANAGES_NFS for controlling the lockmanager
add a simple events script to manage vsftpd
merge from tridge
provide machinereadable output for ctdb ip
add some text about CTDB and in which scenarios it would be a good
ctdb is only a ha solution when combined with a cluster filesystem
merge from tridge
need to install the vsftpd script in make install
merge from tridge
merge from tridge
merge from tridge
merge from tridge
add a control to permanently enable/disable a node
show the disabled/permanently disabled status in the machinereadble
distribute the takenover nodes more evenly among the surviving nodes
merge from tridge
add a webpage for hot to get the code. based on the wikipage
add a page (based on the wiki) on how to build samba3 and ctdb
add a page on how to configure CTDB based on the wiki
update the names of envvars to use the CTDB_ prefix
remove CTDB_MANAGES_SAMBA from the config page. this should be in
add a page for starting and (basic) testing of ctdb based on the
merge from tridge
update the blurb for the setmonmode control it takes 0 or 1 as a
add an initial manpage for the ctdb tool
put the text in "generation" inside a para block
add a page for configuring samba
add a page on how to configure clustered nfs
add the generated manpage for ctdb so that it is available also for
fix typo
add links to how to configure samba/nfs in the samba/nfs sections
add instructions on how to set up HA-FTP using vsftpd and ctdb
show how to start the newly configured vsftpd service by disabling and
typo
merge from tridge
merge from tridge
replace the list of documentation links on the front page with a link to
create a separate list of links for the manpages
add a tiny prerequisites page stating that you need a cluster filesystem
capitalize some links
add code to unban when we become/unbecome recmaster
remove rht unban code from when we take recmaster role. we can not
unban all nodes when we release recmaster role or when we win an
should be sufficient to unban nodes when we unbecome recmaster
merge from tridge
merge from tridge
merge from tridge
initial ctdbd man page
add a link to the ctdbd man page
merge from tridge
add descriptions of the options for the ctdb command
merge from tridge
minor man page update
when public interface is not set, print this to the logfile before
merge from tridge
merge from tridge
note that there is no link on the PUBLIC interface
add a small test tool that can be used to create a massive amount of
add a mechanism to the samba event script to do periodic cleanup of the
merge from tridge
merge from tridge
merge from tridge
merge from tridge
when accepting an incoming connection, verify that the source address is
rename tnode->queue to tnode->out_queue to indicate this queue is for
add incomplete code fragments to perform SCSI PERSISTENT RESERVATION
get rid of some compiler warnings for the scsi tool
start implementing command line parsing to scsi_io to make it take
add GPL comment to scsi_io.c
merge from tridge
add more command line parsing
add a tuneable to control how long we wait after a successful recovery
initial version of a socketkiller tool
merge from tridge
ETH_P_IP does not work on my ubuntu system so changing it back to the
change the signature for ctdb_sys_send_ack() to ctdb_sys_send_tcp()
add a new ctdb_sys_kill_tcp() function that kills (RST) the specified
add a killtcp command to the ctdb tool
we dont need socketkiller anymore now that the
merge from tridge
merge from tridge
merge from tridge
add a command to ctdb to send tickle-ack's
merge from tridge
update the manpage for ctdb to describe killtcp and tickle
break the tickle description into two paragraphs
merge from tridge
remove 59.nfslock and fold this into 60.nfs
use 'ctdb tickle' instead of sendip to tickle nfs clients.
use the official iana number for ctdb and not 9001
merge from tridge
when checking the nodemap flags for consitency while monitoring the
a better way to fix the DISCONNECT|BANNED vs DISCONNECT bug
when a remote node has sent us a message to update the flags for a node,
nicer handling of DISCONNECTED flag when we update the node flags from
dont restart the tcp service after a ip takeover, it is more efficient
run the ctdb killtcp in the background
make it possible to specify how many times ctdb killtcp will try to RST
update the documentation for NFS to mention that the lock manager must
use the socketkiller to kill off all lock manager sessions as well
merge from tridge
merge from tridge
regenerated ctdbd manpage
print the operation code in the debug message when we discard a packet
pass the header to ctdb_become_dmaster instead of just the reqid
add a ctdb_kill_tcp_callback() that will perform a kill tcp using a
first cut at a better and more scalable socketkiller
add a ctdb_ prefix to two public functions
add daemon code for the new kill_tcp control
make the ctdb tool use the killtcp control in the daemon instead of
ctdb killtcp no longer takes a <numrst> argument to control how many
rename killtcp->fd to killtcp->capture_fd
as an optimization for when we want to send multiple tickles at a time
the posix.4 name for the priority field is sched_priority
netinet/if_ether.h is more portable than net/ethernet.h
merge from tridge
add a private_data field to the killtcp structure and let the system
update the comment at the top of file to reflect the purpose of the file
add an initial system_aix.c to manage raw sockets under aix
add some support for controlling Linux or AIX in the makefile
add some configure magic to make it configure and build properly on
when we have found that /etc/rc.d/init.d/SERVICE exists, then run that
if we dont have /etc/sysconfig and we dont have /etc/default
if we dont have nc or netcat, try using netstat as a final attempt to
try netstat as a last attempt to check a tcp port in
there is no point in doing anything in 10.interfaces unless we have a
fix bug introduced in previous commit
we dont do nfstickles unless ctdb manages nfs
change the way we pick/find a new node to takeover for a failed node
add a check if start_node is beyond the end of the nodemap and reset it
merge from tridge
merge from tridge
change the tickle list from one global list into an array per public
updated ctdb tickle management
when a client connects with TCP_CLIENT we should look at the
set the tcp tickle update flag to true once we have done a takeover and
when we build the arp structure for sending gratious arp (and tcp
initial version of talloc based red-black trees
no need to have a separate assignment of the tcparray pointer followed
if sibling is NULL it is a leaf node and thus black.
there were situations where we were not guaranteed that a sibling had 2
fix some remaining bugs with deleting nodes
remove dead code
fix the remaining bugs with tree delete that testing found.
add a small tool to compare rb tree with a timeval_compare()+add an
after we have checked dest address that it is a public address
update the manpage for the -n option to make it clear we are referring
add a small tool that can send smnotify packets
dont wait for the default rpc timeout when trying to bing to a client.
install smnotify in $(bindir)
update the specfile to install smnotify
do not restart lockd/statd when we takeover an ip address this is
we dont use sm-notify any more
move scsi/scsi_io.c to utils/scsi/scsi_io.c
merge from tridge
add a ctdb command to print the default public ip of a host.
change error output in ctdb and in ctdb_cmdline_client to print to
dont wait indefinitely for the initial getvnn to complete
/etc/sysconfig/nfs can now discover the public ipaddress automagically
when inserting data in the tree, if there was already a node with the
compile rb_tree.c by default.
merge from tridge
change fprintf(stderr to DEBUG(0, now that client DEBUGs are redirected
add a tree insert function that takes a callback fucntion to populate
add helpers to add/lookup/delete nodes in a tree where the key is an
add helpers to traverse a tree where the key is an array of uint32
when we want to kill a tcp connection we stored the connection
add more extensive test cases and verify that we are not losing any
run the test for 60 seconds if that is what we claim
remove an unused function
change the mem hierarchy for trees. let the node be owned by the data
remove an extra blankline
enhanced tests to verify the tree integrity when adding/removing nodes
from Chris Cowan
add a wrapper function to create the key used to insert/lookup a certain
comment that ctdb_event_script_v() is called from a forked childs
add a function to return the first entry that is stored in a tree where
add a description on how the event scripts works to the README and make
fix typo
zero out the sa struct to supress a valgrind error
call the service specific event scripts directly from the forked child
add a comment that the talloc_free also removes the script from the tree
change the now rather small /etc/ctdb/events script into a service
add text to the event script timeout log on how to find out which script
we should start winbindd before we start smb
start winbind before smbd
merge from tridge
if a public address has already been taken over by a node, then let that
dont pollute the log with 'Registered PID XXX for client YYY' at log
make sure that the event script is executable and just ignore it
setup the logfile much earlier in the startup procedure for ctdbd
add an atexit() that will print "CTDB daemon shutting down" in the log
when we shutdown the service due to receiving a 'ctdb shutdown' command
change the structure used for node flag change messages so that we can
if lockwait takes an excessive time to complete. log the time it took to
when a node becomes banned its databases are no longer part of ctdb
if the node is inactive i.e. banned or disconnected then that node is
create a define to represent the 'invalid' generation id we used in two
when we receive a packet from the network, check explicitely that the
merge from tridge
create an enum to describe the state of a control in flight instead of
in ctdb_call_recv() we must check that state is non-NULL since
hang the ctdb_req_control structure off the ctdb_client_control_state
break checking that the recoverymode on all nodes are ok out into its
try out a slightly different api for controls where you provide a
get rid of the explicit global timeout used in the previous example and
comment why we do a talloc_steal
change the api for managing callbacks to controls so that isntead of
cleanup invoke_control_callback. we dont need to pass some of these
add an initial implementation of a service_id structure and three
add a control to pull the server id list off a node
change the monitoring of recmode in the recovery daemon to use a fully
add async versions of the freeze node control and freeze all nodes in
make the ctdb shutdown command use the async _send() function to send
add an extra debug statement when we send a SIGTERM to a process
merge from tridge
when we start 60.nfs we must make sure that the shared storage
merge from tridge
change how we do public addresses and takeover so that we can have
change ctdb->vnn to ctdb->pnn
change ctdb_validate_vnn to ctdb_validate_pnn
change vnn to pnn in the ctdb tool
change ctdb_get_vnn to ctdb_get_pnn
change server_id.vnn to server_id.pnn
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn
change ctdb_ctrl_getvnn to ctdb_ctrl_getpnn
change ctdb_send_message to take pnn as parameter instead of vnn
change debug output from vnn to pnn
change debug output from vnn to pnn
change vnn to pnn in the traverse structure
dont just always return 0 from the killtcp control.
fix typo in debug output
we cant have takeover_ctx hanging off ctdb since it is freed/recreated
get rid of the ctdb_vnn_list structure and just use a single list of
allow different nodes in the cluster to use different public_addresses
dont dereference vnn before we have assigned it a pointer value
we should always get data back from getnodemap
we dont use 'sendip' any more so dont check for it and exit from the
document NFS_TICKLE_SHARED_DIRECTORY on our web page
the event scripts for nfs are called 60.nfs and 61.nfstickle
specify the additional ports for nfs
improve the handling of hosts to notify with statd
we dont need the rpc.statd on shared directory neither do we need
60.nfs:
merge from tridge
add a short delay after stopping nfslock to make it less likely that
update web nfs with the new NFS_HOSTNAME variable we need to be able to
remove the ctdb publicip command
ctdb ip must loop over all connected nodes to pull hte public ip list
set /proc/sys/net/ipv4/conf/all/arp_filter to 1 by default when
change the signature to ctdb_sys_have_ip() to also return:
update a comment
merged patch from tridge
grab the interface name from tok and not from the uninitialized array
move all ip addresses onto loopback when we startup ctdb
use the public addresses variable instead of hardcoding the path
merge from tridge
when a ctdb_takeover_run has failed we must make sure that
disable nfsv4 in etc/sysconfig/nfs
update the section about event scripts
let each node verify that they have a correct assignment of public ip
during startup make sure to delete any public addresses from any
merge from tridge
documentation updates
update vnn -> pnn in documentation
let ctdb ip only print the ip addresses known to the specified node
merge from tridge
add documantation of additional requirements for FTP so that users can
one more command to run to enable winbind for vsftpd
merge from tridge
merge from tridge
when ctdb attaches to a database it broadcasts the attach to all other
in ctdb_control_persistent_store() we must talloc_steal() the pointer to
merge from tridge
merge from tridge
when we have a public ip address mismatch (i.e. we hold addresses we
merge from tridge
change async.private to async.private_data since private is a reserved
merge from tridge
add a function in the ctdb tool to determine whether the local node is
add an initial test version of an ip multiplex tool that allows us
add a control to send gratious arps from the ctdb daemon
send out gratious arps when we are starting up serving the "single
remove some debug outputs
add a --single-public-ip argument to ctdbd to specify the ip address
merge from tridge
simplify election handling
first check that recovery master is connected (we know this from our own
move the kill_tcp_connections() function from 10.interfaces to functions
use kill_tcp_connections() to kill off all tcp connections to the
use $CTDB_BASE in 90.ipmux instead of hardcoding it to /etc/ctdb
dont try to lock the file from inside the ctdb daemon.
merge from tridge
include system/network.h so we get the prototype for inet_aton()
add a new tunable : DeterministicIPs that makes the allocation of
add back the test inside the daemon that if someone asks us to drop
merge from tridge
reverse the order in which public ips are listed so it matches the order
use NF_DROP instead of NF_STOLEN when we tell the kernel to not worry
flush the route cache when we have added the single public ip to the
merge from tridge
set the flags explicitely isnstead of masking them in
add a new transport method so that when a node is marked as dead, we
add a stub restart method for IB
add missing ) in the IB transport (which i dont compile for)
dont close the file, just set the fd to -1
merge from tridge
dont set some of the sysctl variables in statd-callout. these are
dont set parameters in statd-callout if they should be set they
when we are shutting down, we should first shut down the recovery daemon
when shutting down, we should stop monitoring
nfs may take a while to stop so do it in hte background
merge from tridge
since service nfs stop/start sometimes fail to bring up the mount daemon on rhel5
merge from tridge
merge from tridge
if bond* interfaces are used as public interfaces we can not rely on ethtool but
merge from tridge
add a new tunable "CheckNodesFile" that when set to 0 will disable the
revert 773
merge from tridge
merge from tridge
merge from tridge
add CTDB_MANAGES_WINBIND to /etc/sysconfig/ctdb to allow ctdb to be used
only check port 21 when monitoring vsftpd
from Christian A
merge from tridge
when we print "Remote node had flags xx local had flags xx
if we get a modflag control but the flags remain unchanged, log this
add an extra log if we get a modflags control but it doesnt change any
when we as the recovery daemon on the recovery master detects that the
If update_local_flags() finds that a node has changed its BANNED status
add log output for when ctdb_ban_node() and ctdb_unban_node() are called
check for recursive bans in ctdb_ban_node() and remove the previous ban
when monitoring the node from the recovery daemon, check that the
Add a --node-ip argument so that one can specify which ip address a
move ctdb_set_culprit higher up in the file
dont manipulate ctdb->monitoring_mode directly from the SET_MON_MODE
->monitor_context is NULL when monitoring is disabled.
get rid of the control to set the monitoring mode.
add ctdb_disable/enable_monitoring() that only modifies the monitoring
always set up a new monitoring event regardless of whether monitoring is
log that monitoring has been "disabled" not that it has been "stopped"
up the loglevel for the enable/disable monitoring to level 1
merge from tridge
for the banned status, we should allocate this structure as a child of
rework banning/unbanning nodes
merge from tridge
merge from tridge
merge from tridge
merge from tridge
merge from tridge
add documentation for the vacuum and repack commands.
typo
merge from tridge
add eventscript for http
split node health monitoring and checking for connected/disconnected
merge from tridge
merge from tridge
ctdb_control_send() does not need to take an outdata parameter
improve documentation of --nosetsched
explain public-interface and single-public-ip better
update --transport
update
update getdbmap docs
doc updates
merge from tridge
merge from tridge
add a ctdb uptime command that prints when ctdb was started and when the
add ctdb_uptime.c
prepare for release
Merge branch 'master' of git://git.samba.org/tridge/ctdb
Merge branch 'master' of git://git.samba.org/tridge/ctdb
Specify and print debuglevels by name and not by number
add an eventscript to start/stop iscsi
update ctdb version
update ctdb revision
add monitoring of iscsi to the eventscript
add documentation on how to set up ha-iscsi with ctdb
change the IF interface is a BOND THEN xxx ELSE assume everything is ethernet
in the 91.lvs event script
dont use an absolute path for the basename command
dont use an absolute pathname for the iptables tool
dont use an absolute pathname for the touch command
dont use absolute pathnames for the netstat tool
Merge git://git.samba.org/tridge/ctdb
update to revision 28
create a startstop_nfs function that can start/stop the nfs service of different platforms
add helpers to stop/start nfs lockmanager on different platforms
From Mathieu PARENT <math.parent at gmail.com>
from Mathieu PARENT <math.parent at gmail.com>
read the current debuglevel in each loop in the recovery daemon so that we
the ctdb structure must make its own copy of the ->address field and not just
to make it easier/less disruptive to add nodes to a running cluster
make the ctdb reloadnodes reload the nodes file on all nodes and restart the transport
update version to 1.0.29
monitor the amount of free memory and if this treshold is crossed, monitoring will log an OOM memory in the ctdb log and shut down ctdb on the node.
Add a new parameter to /etc/sysconfig/ctdb
Add debug output to indicate why a node starts up in DISABLED state
document the --start-as-disabled argument
add a new tunable DisableWhenUnhealthy which when set will cause a node to automatically become DISABLED anytime monitoring fails and the node becomes UNHEALTHY.
add a control to get the name of the reclock file from the daemon
add a new file <reclock>.pnn where each recovery daemon can lock that byte at offset==pnn to offer an alternative way to detect which nodes are active instead of relying on CONNECTED being accurate.
store the num_active variable (number of connected/active nodes) inside the rec
update the reclock pnn count for how many nodes are connected to the current node once every 60 seconds
change recmaster from being a local variable in monitor_cluster() to be a member of the ctdb_recoverd structure
add a new tunable : reclockpingperiod
add a num_connected field to the rec structure that holds the number
when we reallocate the ip addresses for nodes, we must make sure that
add a new tunable 'NoIPFailback'
A new command to 'ctdb'
document some new ctdb command
document some public tunables
make 'ctdb ip' provide machinereadble output using '-Y'
provide machinereadble -Y output for 'ctdb getdebug'
Update ctdb uptime to provide machinereadable output
update to version 1.0.30
Redo the vacukming process to mkake it scalable.
change the log level for the message when someone connects to a non-public ip
dont steal reply_data.dptr to ctdb if there is no data, since then we would leak
in ctdb_call_local() we can not talloc_steal() the returned data and hang it off ctdb.
From M Dietz,
update to version 1.0.31
fix a memory leak
Add two new controls to add/delete public ip address from a node at runtime.
update the iscis support under RHEL5 to allow one iscsi target to be defined for each public address in the cluster.
make sure the iface string is nullterminated in the addip control packet
return 0 if iscsi is disabled
from tridge: decorate dumpmemory output so that packets that are queued show up with a little more information to make memory leak debugging easier
add improvements to tracking memory usage in ctdbd adn the recovery daemon
decorate the memdump output with a nice field for ctdb_client structures to show the pid of the client that attached
add a mechanism to force a node to run the eventscripts with arbitrary arguments
bump version to .32
From Chris Cowan
we allocated one byte too little in the blob we need to send as the control to the server.
add a ctdb command to print the ctdb version
add possibility to provide site local modifications to the event system
From Chris Cowan
update to version .33
shell scripts need extra spaces sometime
fix compiler warning during a fatal error failing to lock down the socket
Revert "- accept an optional set of tdb_flags from clients on open a database,"
Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,""
Revert "Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,"""
make ctdb eventrscipt accept the -n all argument to run the event script on all connected nodes
when a node disgrees with us re who is recmaster
add support for -n all in "ctdb -n all ip"
add support for -n all in "ctdb -n all ip"
when a node disgrees with us re who is recmaster
make ctdb eventrscipt accept the -n all argument to run the event script on all connected nodes
Revert "Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,"""
Revert "Revert "- accept an optional set of tdb_flags from clients on open a database,""
Revert "- accept an optional set of tdb_flags from clients on open a database,"
when adding a new public ip address to a running node using the 'ctdb addip' command,
make 'ctdb catdb' produce output that resembles the output of tdbdump
when deleting a public ip from a node that is currently hosting this ip, try to move the ip address to a different node first
update version to .34
Use DEBUG_ERR and not DEBUG_WARNING when we get a connection
Add a capabilities field to the ctdb structure
Add ability to disable recmaster and lmaster roles through sysconfig file and
Monitor that the recovery daemon is still running from the main ctdb daemon
close and reopen the reclock pnn file at regular intervals.
make sure we lose all elections for recmaster role if we do not have the recmaster capability.
Expand the client async framework so that it can take a callback function.
update to version .35
From Mathias Dietz
Merge git://git.samba.org/tridge/ctdb
fix merge corruption
fix a bug where the public ip addresses of the cluster would not be redistributed across the cluster after a recovery was performed.
update to new version
ctdb->methods becomes NULL when we shutdown the transport.
when pulling the nfs directories to check during 60.nfs monitor
Update to new release
Try to use tdb transactions when updating a record and record header inside the ctdb daemon.
When we run the init script to start the ctdb service
add a checksum routine for tcp over ipv6
move the function to open a sending socket into the main executable since this function will dissapear soon...
Add a missing include
add a new container to hold a socketaddr for either ipv4 or ipv6
Start implementing support for ipv6.
Merge git://git.samba.org/tridge/ctdb
Update some debug statements. Dont say that recovery failed if the failed function was invoked from outside of recovery
remove some unnessecary tests if ->vnn is null or not
dont check whether the "recovered" event was successful or not
dont disable/enable monitoring for each eventscript, instead
add "machinereadable output" support to "ctdb getmonmode"
new version .38
When ctdb has just been installed on a node, there wont be any persistent databases
lower the debug level for the "can not start transaction" since we do expect this to happen a few times inside ctdb (since we cant really block and wait for all locks to dissapear before we can write the header, for example when doing a dmaster miration)
dont emit the can not start transaction with locks held at all.
lower the loglevel for when we have "tickles" for an ip address that is not a public address on the local node (it may be a public address on other nodes)
lowe the loglevel for the warning that releaseip was called for a non-public address.
move the config optoin CTDB_MANAGES_VSFTPD from /etc/sysconfig/vsftpd to /etc/sysconfig/ctdb
move the CTDB_MANAGES_ISCSI setting from /etc/sysconfig/iscsi to /etc/sysconfig/ctdb
move CTDB_MANAGES_NFS from /etc/sysconfig/nfs to /etc/sysconfig/ctdb
update version to .39
second try for safe transaction stores into persistend tdb databases
cleanup of the previous patch.
fix some memory hierarchy bugs in allocation of the state structure for persistent writes.
restore a timeout value to the default settings instead of the hardcoded 3 second test value
disable transactions for now, there are more situations where there are conflicting locks and the "net" command is not prepared that the persistent store can fail.
read the samba sysconfig from the samba eventscript
update to .40
do persistent writes in a child process
remote a comment that is no longer relevant
remove another field we dont need in the childwrite_handle structure
dont bother casting to a void* private_data pointer,
update to .41
debugleves can now be negative so print their value using %d instead of %u
create the nodes file in a 'test' subdirectory and not the current directory
redesign the test of persistent writes
run the persistent write test with 4 nodes by default
add a parameter for the tdb-flags to the client function
convert handling of gratious arps and their controls and helpers to
fix a comment
first cut to convert takeover_callback_state{}
add a callback for failed nodes to the async control helper.
make it possible to re-start a recovery without marking the current node as
when a eventscript has timed out, log the event options (i.e. "monitor" "takeip 1.2..." etc)
if the event scripts hangs EventScriptsBanCount consecutive times in a row
ban the node after 3 failed scripts by default
update to 1.0.42
it is 2008 not 2008 right now :-)
the write() from the freeze child process can fail
only loop over the write it the write failed
verify that the recmaster has the correct flags for us and if not tell the recmaster what the flags should be
third attempt for fixing a freeze child writing to the socket
/etc/ctdb/functions should not be executable
force an update of the flags from the recmaster after each monitoring run
reduce loglevel of the info message we are updating the flags on all nodes
test
Revert "test"
test
make /etc/ctdb/functions executable and add a hashbang to it so
initdit/ctdb is not a config file
new version
update a comment to reflect that this is not always a real recovery
print the opcode when an async callback detects an error
track both when we last started and ended a recovery.
we dont need to explicitely thaw the databases from the recovery daemon
in the destructor for the lock-wait child, make sure that we cancel any pending
If a transaction commit fails. Log this error and cancel all pending transactions to the
we need a 'case x:' in our ugly 'encode the control opcode as a linenumber in valgrind output' hack to make it work
zero out the sockaddr_in structure before we store the ipv4 data in it to make sure that all data is initialized. Othervise valgrind will complain about uninitialized data when we write this structure out on the wire
new version .44
use more libral handling of event scripts timing out.
waitpid() can block if it takes a long time before the child terminates
update the monitor event for nfs to track how many times in a row it has failed
new version 1.0.45
set sigchild to SIG_IGN instead of SIG_DFL
Revert "set sigchild to SIG_IGN instead of SIG_DFL"
Revert "waitpid() can block if it takes a long time before the child terminates"
Replace \s with [[:space:]] in our regexps we use for egrep.
From Chris Cowan, patch to make aix compile again
mark /etc/ctdb/functions as a config file to keep rpmlint happy
install the readme in /usr/share/doc/ctdb/ instead of under /etc
pull the development files out into their own package
add spec file for development rpm
copy ctdb-dev to the spec directory
Revert "copy ctdb-dev to the spec directory"
Revert "add spec file for development rpm"
Revert "pull the development files out into their own package"
proper waitpid() fix.
if we have enabled LVS but we dont have all the required packages
remove the attempts to restart NFS.
add an option to skip checking that all the samba shares are ok
make LVS a capability so that we can see which nodes are configured with
Add three mode commands to the CTDB tool.
Update to the LVS eventscript.
add documentation for both LVS:single-ip and CAPABILITIES:wan-accelerator
new version 10.0.46
explain why you have to have a real ip address as well as the "virtual"
Revert "Yip yip yip!"
Fix a very subtle race where we could get a double free of a talloced
new version 1.0.47
remove a debugging echo statement
Add two new options
change how we filter out "empty" records in the traversecode
Merge git://git.samba.org/tridge/ctdb
Do not allow "ctdb eventscript" to start new eventscripts while we are in recovery mode
Add two new controls to start and cancel a persistent update.
new version 1.0.48
Only decrement the "number of persistent writes in flight" If/when
Allow the fix-to-make-persistent-writes-safer work with unpatched samba versions
lower a debug message
lower a debug statement
We can not assume that just because we could complete a TCP handshake
if a new node enters the cluster, that node will already be frozen at start
Merge git://git.samba.org/tridge/ctdb
new version 1.0.50
From Michael Adams,
From Alexander Saupp.
new version 1.0.51
New version 1.0.52
Merge git://git.samba.org/tridge/ctdb
remove the reclock file we store pnn counts in.
Merge git://git.samba.org/tridge/ctdb
new version 1.0.53
new version 1.0.54
Merge git://git.samba.org/tridge/ctdb
Merge git://git.samba.org/tridge/ctdb
new version 1.0.55
fix the date soe rpmbuild works
new version 1.0.56
Add two new ctdb commands :
Encode a file version number in the database backup header
store the database name, not the backup filename in the database header
only freeze the local node when doing a backup and not the entire cluster
use a local tdb_traverse instead of a ctdb_pulldb to lessen the impact of the system while performing a database backup
initial ipv6 patch
remove a file we dont need
fix the ipv6 checksum calculation for pseudoheader so that it actually works
fix a bug in the tcp socketkiller for ipv6
update the socketkiller in the eventscripts to be able to handle ipv6
when we compare ip addresses in ctdb_same_ip we must first canonicalize the addresses so that we realize that 127.0.0.1:22 is really the same thing as ::ffff:127.0.0.1:22
make the function to canonicalize a sockaddr structure public
we must canonicalize the sockaddr structures in killtcp so that we do the necessary downgrade if required
When we harvest all tcp connections to kill off after a takeip/releaseip event we must also harvest the ipv4 connections which may be presented in ::ff:xxxx:xxxx form by netstat
when we collect all ip addresses and sort them for the "ctdb ip -n all" output we must look at more than just the first 4 bytes of the sockaddr address or ipv6 wont work
Do not fail the takeip event if the "ip addr add ..." command failed.
version 1.0.57 : initial ipv6 support
Ronnie sahlberg (75):
Split CTDB into sub contexts to handle multiple concurrent databases within the same context.
tridge
first test of forced migration of records. compiles but not tested.
When we create a tcp connection to a remote ctdb node do an explicit bind() to set our source side to the same ip address as we use to listen to ctdb traffic.
merge from tridge
dispatcher daemon first try.
make normal/deamon mode controllable by a ctdb flag so that the api looks the same in both modes to a client.
change ctdb_client_read() to use the ctdb_read_pdu() helper
add a atexit() call to remove the domain socket when the daemon exits
add a CONNECT_WAIT flag to replace the call ctdb_connect_wait() since
restore the test script that was updated by mistake in the previous checkin
move the checking of the CONNECT_WAIT flag into the start method for tcp
change the tcp code to call ctdb_read_pdu() instead of doing the partial read thing explicitely
remove old ifdef that remained from when this was a header file
updates from tridges tree
add a call to register the pid for a messengin service
rename client.id to client.messenger_id to make the purpose of the field more obvious
merge from tridge
add a test that sends messages between clients connected to the same ctdb
merge from tridge
create a standalone ctdb daemon and a script ./direct/ctdbd.sh to start two such daemons in a 2 node cluster.
add a vnn field to the ctdb_reply_connect_wait pdu so that we can tell
add call/reply parsing of the cluster connect-wait call to the test daemon.
add an example on how to send a message to the daemon
merge from ridge and vl
add a test message to the messaging test so we can see that the message data is also passed from originator to receiver
do an infinite loop calling event_loop_once() in the ctdbd parent process instead of event_loop_wait() since the latter will return and thus take down the daemon
add an example on how to read a message from the domain socket
merge from volker
initial support for two new pdus for the domain socket to do fetch_lock
dont hardcode gdb in the test script. ooops
merge from tridges tree
when sending back a fetch lock reply to a client
add a beginning of a new test
add the two missing file from the previous commit
merge from tridge
add store_unlock pdu's for the domain socket.
add more elaborate test to fetch1 test
add missing code to store_unlock so that the data that a client writes is stored in ltdb
merge from tridge
merge from tridge
update to fetch1.sh test
add examples for volker on how to do fetchlock/storeunlock
add code to fetch1 test to tell the two child processes one at a time to fetch_lock the same record
merge from tridge
merge from tridge
merge from tridge
merge from tridge
change some error printouts to make it easier to determine whether the error occured in the client or in the daemon
initial change to remove store_unlock pdu and use tdb chainlock in the client
remaning code to finish lock_fetch/store_unlock clientside helpers
do not use a ctdb_record_handle for client fetch_lock/store_unlock any more
finalize fetch lock changes to get rid of the record handle
create symbols for fetch lock response status
merge from tridge
merge from tridge
merge from tridge
merge from tridge
merge from volker to prevent some valgrind errors
merge from tridge
enhance fetch1 test to verify that a lock is released when a client terminates while holding the lock and the next blocked waiting client is assigned the lock
merge from tridge
initial shutdown function where a client can request an orderly shutdown of a ctdb cluster
add/finish the ctdb_shutdown() function.
merge from tridge
merge from tridge
we dont need the structure ctdb_reply_shutdown since we dont implement that pdu any more
merge from tridge
merge from tridge
ctdbd does no longer take a --daemon parameter since we no longer do non-daemon mode
merge from tridge
merge from tridge
add some tests in the daemon that a REQ_CALL that a client sent us has valid srcnode and destnode
remove a comment that is no longer valid
the checks for srcnode and destnode from the client are redundant since the daemon will sort these out itself before it sends the call of to either the local handler or a remote daemon
Volker Lendecke (14):
Fix uninitialized variable warnings
Handle a client that exited correctly: We need to ignore SIGPIPE and when the
merge
Add a test to read back the message
Merge tridge's tree
Rename "private" to "private_data"
ZERO_STRUCT writes one byte too many here.
Add timestamps to debug output.
typo
Some more debug and two memleak fixes
Be less verbose
Clean up the call_states correctly
Add --dbdir to ctdbd. Necessary for shared operation between ctdbd and smbd.
The remote node needs to get the IMMEDIATE_MIGRATION flag to actually send the
jmcd at samba.org (4):
Inital rpm build files
updates from tridge
updates from tridge
Next round of packaging updates:
root (3):
the while loop in the startup event runs as a subshell so we need an extra || exit 1 at the end
Merge commit 'ronnie-ctdb/master' into tridge
listen_fd is auto-closed
-----------------------------------------------------------------------
--
CTDB repository
More information about the samba-cvs
mailing list