Patch: Tune "dir" a bit

Scott Lovenberg scott.lovenberg at gmail.com
Mon Mar 25 13:35:34 MDT 2013


On Mon, Mar 25, 2013 at 12:16 PM, Scott Lovenberg
<scott.lovenberg at gmail.com> wrote:
> On Mon, Mar 25, 2013 at 12:10 PM, Jeremy Allison <jra at samba.org> wrote:
>> On Sat, Mar 23, 2013 at 12:07:40PM -0400, Scott Lovenberg wrote:
>>> On Sat, Mar 23, 2013 at 11:55 AM, Scott Lovenberg
>>> <scott.lovenberg at gmail.com> wrote:
>>> > On Fri, Mar 22, 2013 at 1:20 PM, Jeremy Allison <jra at samba.org> wrote:
>>> >> On Fri, Mar 22, 2013 at 02:37:31PM +0100, Volker Lendecke wrote:
>>> >>> Hi!
>>> >>>
>>> >>> Attached find a patch that makes trans2_findfirst take 10%
>>> >>> less user-space CPU. If someone has the time, can you try to
>>> >>> verify this improvement and push?
>>> >>
>>> >> Indeed this is a significant improvement - I'll push to
>>> >> autobuild and get a bug logged for 4.0.x and 3.6.x to
>>> >> get this change into production releases asap.
>>> >>
>>> >> Jeremy.
>>> >
>>> > FWIW, I confirmed this on a VM because I was really curious.  The
>>> > difference is dramatic when IO is more costly (ie: on a VM running on
>>> > an old P4).  This is the same test that Volker ran:
>>> > ---------------------------------------------------------------------------------------
>>> > [root at primary sambaTest]# time /usr/sbin/smbd -d0 -i
>>> > smbd version 3.5.10-125.el6 started.
>>> >
>>> > real    13m11.306s
>>> > user    0m45.899s
>>> > sys     4m0.341s
>>> >
>>> >
>>> > [root at primary sambaTest]# time /opt/sambaTest/sbin/smbd -d0 -i
>>> > smbd version 4.1.0pre1-GIT-9624ca4 started.
>>> >
>>> > real    13m51.811s
>>> > user    1m12.406s
>>> > sys     4m25.588s
>>> > ---------------------------------------------------------------------------------------
>>> >
>>> > Volker, well done!
>>> > --
>>> > Peace and Blessings,
>>> > -Scott.
>>>
>>>
>>> I feel really stupid.  I interpreted my results backwards.  Why in the
>>> world is 3.5 doing so much better than 4.1?
>>> Thanks to Ira for pointing this out to me. :P
>>
>> So this is an ideal test case for cachegrind :-). Nice isolated
>> test, one process serving... I'd love to see comparitive cachegrind
>> results (hint, hint :-).
>
> Ooh! I've never used cachegrind.  Sounds like a good excuse to play
> with something new.  :)
>
> I'll give it a test run and post my results.
>


Here's the output for 3.5 (3.5.10-125.el6):
[root at primary sambaTest]# valgrind --tool=cachegrind /usr/sbin/smbd -d0 -i
==30918== Command: /usr/sbin/smbd -d0 -i
--30918-- warning: Pentium 4 with 12 KB micro-op instruction trace cache
--30918--          Simulating a 16 KB I-cache with 32 B lines
smbd version 3.5.10-125.el6 started.
==30918== I   refs:      11,113,878,921
==30918== I1  misses:       731,306,747
==30918== LLi misses:             8,862
==30918== I1  miss rate:           6.58%
==30918== LLi miss rate:           0.00%
==30918==
==30918== D   refs:       6,437,440,508  (3,778,067,435 rd   + 2,659,373,073 wr)
==30918== D1  misses:        38,180,098  (   27,687,130 rd   +    10,492,968 wr)
==30918== LLd misses:            21,671  (       17,979 rd   +         3,692 wr)
==30918== D1  miss rate:            0.5% (          0.7%     +           0.3%  )
==30918== LLd miss rate:            0.0% (          0.0%     +           0.0%  )
==30918==
==30918== LL refs:          769,486,845  (  758,993,877 rd   +    10,492,968 wr)
==30918== LL misses:             30,533  (       26,841 rd   +         3,692 wr)
==30918== LL miss rate:             0.0% (          0.0%     +           0.0%  )
Terminated


And the output for 4.1 (4.1.0pre1-GIT-9624ca4):
[root at primary sambaTest]# valgrind --tool=cachegrind
/opt/sambaTest/sbin/smbd -d0 -i
==30957== Command: /opt/sambaTest/sbin/smbd -d0 -i
--30957-- warning: Pentium 4 with 12 KB micro-op instruction trace cache
--30957--          Simulating a 16 KB I-cache with 32 B lines
smbd version 4.1.0pre1-GIT-9624ca4 started.
==30957== I   refs:      15,454,804,105
==30957== I1  misses:     1,017,540,161
==30957== LLi misses:            16,117
==30957== I1  miss rate:           6.58%
==30957== LLi miss rate:           0.00%
==30957==
==30957== D   refs:       9,890,763,550  (6,437,672,253 rd   + 3,453,091,297 wr)
==30957== D1  misses:       105,782,929  (   93,060,183 rd   +    12,722,746 wr)
==30957== LLd misses:            38,284  (       33,555 rd   +         4,729 wr)
==30957== D1  miss rate:            1.0% (          1.4%     +           0.3%  )
==30957== LLd miss rate:            0.0% (          0.0%     +           0.0%  )
==30957==
==30957== LL refs:        1,123,323,090  (1,110,600,344 rd   +    12,722,746 wr)
==30957== LL misses:             54,401  (       49,672 rd   +         4,729 wr)
==30957== LL miss rate:             0.0% (          0.0%     +           0.0%  )


The 4.1 code references memory about 30% more than the 3.5 and has
double the miss rate for those references (0.5% vs. 1%).  Still, 1%
seems relatively low (even with the cost of a miss on the netburst
architecture).  I unfortunately couldn't get the branch prediction
stats as valgrind refused to start smbd with them enabled.

The biggest offender (which isn't named since I didn't compile with
debugging symbols, I'm assuming) for cache misses (D1mr) in the 3.5
code has about 1 billion data reads and 9.7 million cache misses.  In
the 4.1 code the biggest offender does 676 million data reads and
misses 46.7 million times.

Any chance my results are compiler related?  I could crank up the -O
setting on GCC and see if this brings things (more) in line.


The cachegrind output files are attached for analyzing.
-- 
Peace and Blessings,
-Scott.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cachegrind.out.smbd-3.5.10-125.el6
Type: application/octet-stream
Size: 77277 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20130325/7763c16d/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cachegrind.out.smbd-4.1.0pre1-GIT-9624ca4
Type: application/octet-stream
Size: 126688 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20130325/7763c16d/attachment-0003.obj>


More information about the samba-technical mailing list