RAFT and CTDB
obnox at samba.org
Wed Dec 10 01:43:12 MST 2014
On 2014-12-10 at 19:19 +1100, Martin Schwenke wrote:
> On Mon, 8 Dec 2014 17:56:53 +0100, Michael Adam <obnox at samba.org> wrote:
> > On 2014-12-08 at 16:26 +1100, Martin Schwenke wrote:
> > > The summary is that you can't race through and simply confirm that the
> > > test prints the correct data_increment value when running with -rw.
> > >
> > > For the recovery lock to work you need to run the non -rw version and
> > > actually confirm that *the locking rate drops dramatically*. If it
> > > doesn't then it is *not* working!
> > This is not necessarily true!
> Then we need a better test and/or better documentation... and also more
> hours in each day to make that happen. ;-)
> > For instance I remember that a few years ago, for GFS2 with the default
> > configuration, I observed a constant lock rate until I reached 5
> > nodes or so. This was due to the fact, that GFS' lock manager by
> > default restricted locks to 100/second. Only if you removed that
> > limit, you could see that dramatic drop.
> > Also the drop will not be as dramatic with every file system,
> > since file systems seem to have different levels of optimization
> > when only one node is involed.
> > I also remember (I think also with GFS), that initial lock rate
> > was pretty high for 1 node (with custom config), and dropped
> > drastically when I added a node. But when I removed the but-last
> > node, the rate did not raise as drastically as it initially
> > dropped, i.e. not to the orignal high lock rate.
> > The explanation was that the lock manager stayed in the special
> > mode for a single locking node only until a second locking node
> > was added, but it did not revert back to the special scheme
> > after the last had left (presumably based on a heuristic that
> > probably more lockers would come back later).
> > So I'd say that ping_pong without -rw is generally good for
> > seing possible lock rates, but if you want to verify real
> > behaviour, then you should test with -rw (of course only if
> > the file system implements coherence of data operations under
> > locks, which hopefully all file systems that we can seriously
> > take into account do...). :-)
> Ah, but in the OCFS2 case the -rw test works, while the "without -rw"
> test does not work! ;-)
That is really really strange.
But the major problem seems to be:
How can we reliably tell whether the "without-rw" test succeeds?
From what I wrote above, the pure lock rate does not always seem
to give enough information.
How did _you_ tell that the without-rw test failed?
> People definitely need to run both. It seems that just running with
> -rw is not good enough.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 198 bytes
Desc: not available
More information about the samba-technical