[RFC] Performance improvements

Fri Jul 13 10:56:56 UTC 2018

Hi Martin

back from hol's.....see'ing that you won't let go :-)
Thanks for that !

On Mon, 2018-07-02 at 16:49 +1000, Martin Schwenke wrote:
> Hi Swen,
> 
> On Fri, 22 Jun 2018 14:29:15 +0200, Swen Schillig <swen at vnet.ibm.com>
> wrote:
> 
> > On Fri, 2018-06-22 at 16:58 +1000, Martin Schwenke wrote:
> > > Hi Swen,
> > > 
> > > On Thu, 21 Jun 2018 08:39:39 +0200, Swen Schillig via samba-
> > > technical
> > > <samba-technical at lists.samba.org> wrote:
> > >   
> > > > On Wed, 2018-06-20 at 15:09 -0700, Jeremy Allison wrote:  
> > 
> >  [...]  
> >  [...]  
> >  [...]  
> > > > 
> > > > Thanks for all your support....and here we go.  
> > > 
> > > As far as I can tell, commit 1 (ctdb: calculate queue input
> > > buffer
> > > size
> > > correctly) reverts a non-trivial amount of the following commit:
> > > 
> > >   commit a61a4b1254e162f39ac22665868ec116a96815f3
> > >   Author: Amitay Isaacs <amitay at gmail.com>
> > >   Date:   Wed Aug 21 14:42:06 2013 +1000
> > > 
> > >     common/io: Limit the queue buffer size for fair scheduling
> > > via
> > >     tevent
> > > 
> > > I'd like the commit message for commit 1 refer to the above
> > > commit
> > > message, making sure that the issue addressed there is still
> > > addressed.
> > > 
> > > Similarly, commit 8 (ctdb: Increase queue's default buffer size
> > > to
> > > 8k) basically reverts the following commit (though with a
> > > different
> > > number):
> > > 
> > >   commit 5dc18882fdbaed627262156dd453e6a8a2c0140f
> > >   Author: Amitay Isaacs <amitay at gmail.com>
> > >   Date:   Tue Jul 26 14:50:10 2016 +1000
> > > 
> > >     ctdb-daemon: Reduce QueueBufferSize from 16k to 1k
> > > 
> > > Again, I'd like to see reference to the above commit to ensure
> > > the
> > > issue mentioned there is being considered.
> > > 
> > > So, while these 2 commits are likely to provide a performance
> > > improvement, they might also introduce harmful side-effects that
> > > were
> > > addressed by the 2 commits above from Amitay.  
> > Ok, what about the side effect introduced with leaving available
> > data
> > on the socket ?  Where was that considered in the above 2 patches ?
> 
> This is a similar extreme that Amitay went to 5.5 years ago:
> 
>   commit a2abdc13531bdd874df3c6d410370d52e24307fd
>   Author: Amitay Isaacs <amitay at gmail.com>
>   Date:   Fri Jan 18 10:42:14 2013 +1100
> 
>     common/io: Rewrite socket handling code to read all available data
> 
> Yes, performance was better.
> 
> However, one side-effect was a lot more messages saying "Handling event
> took X seconds", ctdbd stalling and other nodes noticing dead nodes, so
> causing recoveries.
But didn't my tests show the exact opposite ?
You saw from the values that the variation (and the max-values) improved.
Your fear should have been visable by frequent peaks in the graph, right ?
Besides, I do not see a relation between reading available data of a socket and
the "Handling event ..." messages you refer to.
Wasn't the tevent-model established to decouple the reception from the processing ?

> 
> > How was the buffer size of 1k gathered ?
> 
> Amitay did a bunch of testing and discovered that, even under high
> load, the queue length was usually less than 1K.
I guess you mean the message length here.
But don't you see the disadvantage of the current concept ?
What if the socket has always more than 1 message (which is almost
always the case) of let's say 600byte.
Then we always have 1 full message(good), followed by a memmove(bad),
realizing that we have to re-read(worse) and if we're really unlucky 
we have to face a message which is bigger than 1k and then we have to
realloc memory and another re-read.
So even if all messages are smaller than 1k, having a big buffer
preventing memory re-allocations, memory moves and syscall reads is
almost always beneficial.

> Now, if you're avoiding memmove() and moving along the buffer then
> perhaps you need a bigger buffer.  However, it would be nice to see
> that quantified.  What are the result, along with some of the other
> patches, for a 1KB, 2KB, 4KB, 8KB buffer?  Is it worth adding debug
> message so we know when the buffer actually fills?
Sure, good idea.
I'll prepare that for next week.

> 
> > Besides, I thought the argument for NOT having patch-10 (tevent) is
> > because of fair-scheduling via tevent. 
> > Are you telling me now this cannot achieved via tevent ?
> > So if tevent is just processing all immediate events without looking
> > left or right, then what is the reason again for not having patch-10 ?
> > 
> > Ok, excuse my slight touch of sarcasm here but I really believe that
> > the chosen strategy is a good compromise between the 5 year old 16k and
> > the current 1k value. 
> 
> The tevent programming model is to have an event handler do a small
> amount of work and go back to the event loop.  When you use immediate
> events you definitely give those events an advantage.  However, that
> doesn't mean that you should throw out the programming model.
Ok, I guess I do understand what you call the tevent-programming-model, 
but in that code area we only have immediate events making it quiet
questionable to me to use it all (therefore patch-10).
In addition I do believe that the tevent overhead in this area is
noticeable (bold statement w/o any prove).

> 
> > (Besides.... I planned to make it "tunable" which you dislike as well.)
> 
> It is already tunable.  You're proposing another tunable for the size
> of the memory pool.
> 
> > > Why isn't the destructor in commit 3 (ctdb: remove queue destructor
> > > as
> > > it isn't needed anymore) needed anymore?  Is it due to one of the
> > > previous commits?  Or hasn't it been needed for a while?  If the
> > > latter, is there a particular commit that stops it from being needed?
> > > My guess is this one:  
> > 
> > Well, if you look at the content of the destructor then it seems
> > obvious that it was never really useful.
> > 1. It is the "queue" destructor, so there's no need to zero queue-attributes
> > 2. The memory referenced by queue->buffer.data is of the queue's mem context and 
> >    therefore free'd automatically when the queue is destroyed. So no need to do that manually.
> > 3. There's absolutely no point to set queue->destroyed = true right before I destroy/free the queue.
> > 
> > Besides we did exactly the same for sock_io not too long ago.
> > 
> > commit eae2d35fec071b020f420ba74ac6551c84140a4d
> > Author: Swen Schillig <swen at vnet.ibm.com>
> > Date:   Mon Jan 8 14:13:46 2018 +0100
> > 
> >     ctdb-common: Remove sock_queue_destructor
> >     
> >     The sock_queue_destructor is not needed.
> >     The performed tasks will be performed automatically.
> >     
> >     Signed-off-by: Swen Schillig <swen at vnet.ibm.com>
> >     Reviewed-by: Amitay Isaacs <amitay at gmail.com>
> >     Reviewed-by: Martin Schwenke <martin at meltin.net>
> 
> The destructor does not set queue->destroyed = true at all.  It does
> something slightly different, please re-read. 

Boaah Martin, please don't tell me you want me to find the distinction
between queue->destroyed and *queue->destroyed.
At the end of the day a value is set which is not read anywhere in the
code and it's parent struct (and memory representation) is
free'd/invalidated in that very same job.

>  The destructor was once useful. 
>  All I'm asking you to do is mention in the commit message
> when it stopped being useful. 

Hmm, I thought it is enough to see that it is useless now.
Anyhow, not really sure about its usefulness in the past but I believe
it definitely lost its "purpose" with David's patch to make
queue_io_read reentrant.

commit e097b7f8ff1a9992de1d11474dac4969e30cd679
Author: David Disseldorp <ddiss at suse.de>
Date:   Sun Jul 31 03:14:54 2011 +0200

    io: Make queue_io_read() safe for reentry

>  I'm not arguing against removal of the
> destructor.  I just don't want to have to go on archaeological dig
> every time I look at this code.  If your commit message points to the
> right point in the past then I know what I need to know and you've
> saved me some work.
So am I getting you right in saying that if I'm adding a comment with
the above reference would convince you to accept his one ?

> 
> > > I doubt that commit 6 (ctdb: Consolidate all ALIGNMENT
> > > instructions) provides any performance benefits.  Is it needed as
> > > part
> > > of this patch set?  If not, I'd still like to take time to consider
> > > it
> > > separately when time permits.  
> > It is needed by the succeeding patches.
> > ...and as far as I remember the two of us talked about it extensively
> > and decided to go for it if it's not too hard / intrusive ?
> > Did you change your mind ? 
> > 
> > ...or do you think it's just the wrong time and place ?
> 
> I'm hoping that the performance improvements don't rely on this.  I
> don't have the head space to be able to understand why subsequent
> patches rely on this simplification.  Can you please explain it to me?
> 
> By the way, we have the "alignment" code neatly encapsulated in
> ctdb_allocate_pkt(), so all we really need to do is decide where to
> call it (provided there isn't some reason why we can't all it in "old"
> code).
> 
> If this really needs to happen inside the current patch set then can we
> please drop the extra layers in 1 commit and show any other
> simplifications separately?
No it doesn't need to be inside this patchset.
If you prefer I can move it out and start another RFC-discussion/-thread for this.
I just hope others won't get annoyed be me flooding the mailing-list.

> 
> > > OK, that leaves 3 (other sets of) changes that improve performance.
> > > 
> > > Commit 4 (ctdb: Introduce buffer.offset to avoid memmove) adds an
> > > offset into the buffer, similar to the begin offset in
> > > sock_io.c.  That
> > > seems like a good idea.
> > > 
> > > I know I'm strange, but I'd rather be bored by seeing relevant fields
> > > explicitly zeroed instead of:
> > > 
> > > 	memset(&queue->buffer, 0, sizeof(queue->buffer)); 
> > > 
> > > which I have to decode to understand why it is being done.  
> > Well, I think this is now a decision of preference versus making sure
> > it's all done as expected. What if an attribute gets added to the
> > struct ? what about speed , memory related things tend to be
> > performance critical. ....and 5 or more assignments to do one thing
> > just look dirty..I think.
> 
> Sure, whereas I like things to be spelled out so that I can understand
> them by reading the code.  I often hope the compiler might do things
> like optimise a bunch of adjacent initialisations into 1. However, I
> don't know of it does that...  ;-)

So would that be a killer for your +1 if left as it is ?

> 
> > > I'm also not sure why you dropped the "failed:" goto label and
> > > inlined
> > > that instead.  It just makes the patch bigger.  :-(  
> > 
> > I think it is a good idea to add one extra line and return right away
> > instead of jumping 50 lines.
> > Besides, that jump-tag was used ..... once !
> 
> Please read the patch again.  I'm talking about the 2nd instance.
Fair enough, it is used 3-times but I still believe that it's not worth
jumping 50+ lines just to execute a single command....but again matter
of preference. As you can see I like the early-out-strategy.
Again, would that be a killer for you ?

> 
> > > Commits 5, 7 & 9 (ctdb: Adding memory pool for queue callback, ctdb:
> > > Introduce tunable parameter for queue mempool size, ctdb: Enabling
> > > ctdb_queue_send to use memory from data_pool) add a memory pool to
> > > reduce the number of malloc()s.  Seems like a reasonable plan.
> > > 
> > > In commit 5 please don't use CTDB_NO_MEMORY_NULL().  We would like to
> > > get rid of that and similar macros.  Used lazily, they cause memory
> > > leaks.
> > > 
> > > I'm hoping we can do without commit 7.  We need less tunables instead
> > > of more.  :-)  The plan is to get rid of tunables completely,
> > > replacing
> > > them with configuration file settings.  Hopefully there is a number
> > > that
> > > generally works. I suspect that without some very good advice in the
> > > documentation, most users won't know what to do with it anyway and
> > > will
> > > never use it.
> > > 
> > > Like others, I don't think commit 10 is a good idea.    
> > Ok, I'm not arguing about that one anymore.
> 
> Woo hoo!  :-)
See....you got your victory here...now be a good diplomat and give one
to me as well :-)

Cheers Swen.