Rev 263: merged from ronnie in http://samba.org/~tridge/ctdb

tridge at samba.org tridge at samba.org
Sun May 6 21:56:39 GMT 2007


------------------------------------------------------------
revno: 263
revision-id: tridge at samba.org-20070506215638-67wvsh0zwt7bbllo
parent: tridge at samba.org-20070505074654-1cs2rodrtlo3dkjk
parent: sahlberg at ronnie-20070506215417-d96e79c116240aad
committer: Andrew Tridgell <tridge at samba.org>
branch nick: tridge
timestamp: Mon 2007-05-07 07:56:38 +1000
message:
  merged from ronnie
modified:
  common/ctdb.c                  ctdb.c-20061127094323-t50f58d65iaao5of-2
  common/ctdb_client.c           ctdb_client.c-20070411010216-3kd8v37k61steeya-1
  common/ctdb_control.c          ctdb_control.c-20070426122724-j6gkpiofhbwdin63-1
  direct/recoverd.c              recoverd.c-20070503213540-bvxuyd9jm1f7ig90-1
  include/ctdb.h                 ctdb.h-20061117234101-o3qt14umlg9en8z0-11
  include/ctdb_private.h         ctdb_private.h-20061117234101-o3qt14umlg9en8z0-13
  tests/recover.sh               recover.sh-20070502031230-tpuiet6m6tjdotta-1
  tools/ctdb_control.c           ctdb_control.c-20070426122705-9ehj1l5lu2gn9kuj-1
    ------------------------------------------------------------
    revno: 197.1.82
    merged: sahlberg at ronnie-20070506215417-d96e79c116240aad
    parent: sahlberg at ronnie-20070506214716-d88b489c9db3b15c
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Mon 2007-05-07 07:54:17 +1000
    message:
      hang the timeout event off state   and thus we dont need to explicitely 
      free it   and also we wont accidentally return from the function without 
      killing the event first
    ------------------------------------------------------------
    revno: 197.1.81
    merged: sahlberg at ronnie-20070506214716-d88b489c9db3b15c
    parent: sahlberg at ronnie-20070506205158-a1110272a8a362ad
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Mon 2007-05-07 07:47:16 +1000
    message:
      it now works to talloc_free() the timed event if we no longer want it to 
      trigger
      
      this must have been a sideeffect of a different bug in the recoverd.c 
      code that has now been fixed
    ------------------------------------------------------------
    revno: 197.1.80
    merged: sahlberg at ronnie-20070506205158-a1110272a8a362ad
    parent: sahlberg at ronnie-20070506190248-95317ce2ee808aa3
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Mon 2007-05-07 06:51:58 +1000
    message:
      recovery daemon with recovery master election
      
      election is primitive, it elects the lowest vnn as the recovery master
      
      two new controls, to get/set recovery master for a node
      
      
      
      to use recovery daemon,   start one  
      ./bin/recoverd --socket=ctdb.socket*
      for each ctdb daemon
      
      
      it has been briefly tested by deleting and adding nodes to a 4 node 
      cluster but needs more testing
    ------------------------------------------------------------
    revno: 197.1.79
    merged: sahlberg at ronnie-20070506190248-95317ce2ee808aa3
    parent: sahlberg at ronnie-20070506184112-5da5d1b472919956
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Mon 2007-05-07 05:02:48 +1000
    message:
      add new controls to get and set the recovery master node of a daemon
      i.e. which node is "elected" to check for and drive recovery
    ------------------------------------------------------------
    revno: 197.1.78
    merged: sahlberg at ronnie-20070506184112-5da5d1b472919956
    parent: sahlberg at ronnie-20070506024656-54065db42fc80030
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Mon 2007-05-07 04:41:12 +1000
    message:
      add a test in the function that checks whether the cluster needs 
      recovery or not  that all active nodes are in normal mode.
      If we discover that some node is still in recoverymode it may indicate 
      that a previous recovery ended prematurely and thus we should start a 
      new recovery 
    ------------------------------------------------------------
    revno: 197.1.77
    merged: sahlberg at ronnie-20070506024656-54065db42fc80030
    parent: sahlberg at ronnie-20070506005125-24f5190be388d944
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 12:46:56 +1000
    message:
      update a comment to be more desciptive
    ------------------------------------------------------------
    revno: 197.1.76
    merged: sahlberg at ronnie-20070506005125-24f5190be388d944
    parent: sahlberg at ronnie-20070506004218-e5dc99fda5598e6e
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 10:51:25 +1000
    message:
      change a lot of printf into debug statements
    ------------------------------------------------------------
    revno: 197.1.75
    merged: sahlberg at ronnie-20070506004218-e5dc99fda5598e6e
    parent: sahlberg at ronnie-20070506003844-cfa2e54767c22f74
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 10:42:18 +1000
    message:
      break out the code to update all nodes to the new vnnmap into a helper 
      function
    ------------------------------------------------------------
    revno: 197.1.74
    merged: sahlberg at ronnie-20070506003844-cfa2e54767c22f74
    parent: sahlberg at ronnie-20070506003018-5b7bebe70235b0fb
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 10:38:44 +1000
    message:
      create a helper function for recovery to push all local databases out 
      onto the remote nodes
    ------------------------------------------------------------
    revno: 197.1.73
    merged: sahlberg at ronnie-20070506003018-5b7bebe70235b0fb
    parent: sahlberg at ronnie-20070506002213-9ad049359f13100b
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 10:30:18 +1000
    message:
      add an extra blank line
    ------------------------------------------------------------
    revno: 197.1.72
    merged: sahlberg at ronnie-20070506002213-9ad049359f13100b
    parent: sahlberg at ronnie-20070506001648-69c865e04dda39ce
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 10:22:13 +1000
    message:
      break the code that repoints dmaster for all local and remote records 
      into a separate helper function
    ------------------------------------------------------------
    revno: 197.1.71
    merged: sahlberg at ronnie-20070506001648-69c865e04dda39ce
    parent: sahlberg at ronnie-20070506001242-2ad8fec87f6dea8b
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 10:16:48 +1000
    message:
      create a helper function for recovery that pulls and merges all remote 
      databases onto the local node
    ------------------------------------------------------------
    revno: 197.1.70
    merged: sahlberg at ronnie-20070506001242-2ad8fec87f6dea8b
    parent: sahlberg at ronnie-20070506000437-f1ba6eb009903e0e
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 10:12:42 +1000
    message:
      create a helper function to make sure the local node that does recovery 
      has all the databases that exist on any other remote node
    ------------------------------------------------------------
    revno: 197.1.69
    merged: sahlberg at ronnie-20070506000437-f1ba6eb009903e0e
    parent: sahlberg at ronnie-20070505235312-47eb2e7327632b8f
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 10:04:37 +1000
    message:
      add a helper function to create all missing remote databases detected 
      during recovery
    ------------------------------------------------------------
    revno: 197.1.68
    merged: sahlberg at ronnie-20070505235312-47eb2e7327632b8f
    parent: sahlberg at ronnie-20070505220522-a85200c3738602cb
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 09:53:12 +1000
    message:
      break out the setting/clearing of recovery mode into a dedicated helper 
      function
    ------------------------------------------------------------
    revno: 197.1.67
    merged: sahlberg at ronnie-20070505220522-a85200c3738602cb
    parent: sahlberg at ronnie-20070505215220-9645a5e6c1f75971
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 08:05:22 +1000
    message:
      dont allocate arrays where we can just return a single integer
    ------------------------------------------------------------
    revno: 197.1.66
    merged: sahlberg at ronnie-20070505215220-9645a5e6c1f75971
    parent: sahlberg at ronnie-20070505213216-a6ffc7923cad6b0c
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 07:52:20 +1000
    message:
      dont use arrays where a uint32_t works just as well
    ------------------------------------------------------------
    revno: 197.1.65
    merged: sahlberg at ronnie-20070505213216-a6ffc7923cad6b0c
    parent: sahlberg at ronnie-20070505210747-ae90637e16bf5cb7
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 07:32:16 +1000
    message:
      add a ifdeffed out block to the call.
      
      we really should kill the event in case the call completed before the 
      timeout   so that we can also make timed_out non-static
    ------------------------------------------------------------
    revno: 197.1.64
    merged: sahlberg at ronnie-20070505210747-ae90637e16bf5cb7
    parent: sahlberg at ronnie-20070505205801-e674f5fc04467b37
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 07:07:47 +1000
    message:
      hte timed_out variable needs to be static and can not be on the stack   
      since if the command times out and we return from ctdb_control   we may 
      have events that can trigger later which will overwrite data that is no 
      longer in our stackframe
    ------------------------------------------------------------
    revno: 197.1.63
    merged: sahlberg at ronnie-20070505205801-e674f5fc04467b37
    parent: sahlberg at ronnie-20070505200639-0d19e58cc0e77781
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 06:58:01 +1000
    message:
      update to rhe recovery daemon
      ctdb_ctrl_ calls are timedout due to nodes arriving or leaving the 
      cluster it crashes the recovery daemon afterwards with a SEGV but no 
      useful stack backtrace
    ------------------------------------------------------------
    revno: 197.1.62
    merged: sahlberg at ronnie-20070505200639-0d19e58cc0e77781
    parent: sahlberg at ronnie-20070505195315-2f67bc95926668e5
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 06:06:39 +1000
    message:
      in the recover test
      start the daemons with explicit socketnames and explicit ip address/port
      
      remove all --socket=  from all ctdb_control calls since they are not 
      needed anymore
    ------------------------------------------------------------
    revno: 197.1.61
    merged: sahlberg at ronnie-20070505195315-2f67bc95926668e5
    parent: sahlberg at ronnie-20070505183841-3ee39c2e0abe51ca
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 05:53:15 +1000
    message:
      add support in catdb to dump the content of a specific nodes tdb instead 
      of traversing the full cluster.
      this makes it easier to debug recovery
      
      update the test script for recovery to reflect the newish signatures to
      ctdb_control
      
      
      
      the catdb control does still segfault however when there are missing 
      nodes in the cluster   as there are toward the end of the recovery test
    ------------------------------------------------------------
    revno: 197.1.60
    merged: sahlberg at ronnie-20070505183841-3ee39c2e0abe51ca
    parent: sahlberg at ronnie-20070505183122-09dfca19dab5ec41
    parent: tridge at samba.org-20070505074654-1cs2rodrtlo3dkjk
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 04:38:41 +1000
    message:
      merge from tridge
    ------------------------------------------------------------
    revno: 197.1.59
    merged: sahlberg at ronnie-20070505183122-09dfca19dab5ec41
    parent: sahlberg at ronnie-20070505065134-e92a7a2ea17d523b
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sun 2007-05-06 04:31:22 +1000
    message:
      add a control to get the pid of a daemon.
      this makes it possible to kill a specific daemon in the recover test 
      script
    ------------------------------------------------------------
    revno: 197.1.58
    merged: sahlberg at ronnie-20070505065134-e92a7a2ea17d523b
    parent: sahlberg at ronnie-20070505031726-667ed50c2053a8aa
    parent: tridge at samba.org-20070505040946-iji1cxsyb8ail7bk
    committer: Ronnie Sahlberg <sahlberg at ronnie>
    branch nick: ctdb
    timestamp: Sat 2007-05-05 16:51:34 +1000
    message:
      merge from tridge

Diff too large for email (1498, the limit is 200).


More information about the samba-cvs mailing list