Samba performance
ZINKEVICIUS,MATT (HP-Loveland,ex1)
matt.zinkevicius at hp.com
Thu Apr 3 01:09:11 GMT 2003
This matches our profiling results. Samba is a CPU eater with stat calls
leading the charge. It is also very fond of consuming file descriptors like
crazy.
Matt Zinkevicius
Software Engineer
Network Storage Array Solutions
Hewlett-Packard
> -----Original Message-----
> From: Ravi Wijayaratne [mailto:ravi_wija at yahoo.com]
> Sent: Monday, March 31, 2003 3:35 PM
> To: samba-technical at lists.samba.org
> Subject: Samba performance
>
>
> Samba Performance testing
> ==========================
>
> 1.0 Architecture:
> -----------------
> Server:
> CPU: Intel(R) Pentium(R) III CPU family 1266MHz
> Memory: 1GB
> Kernel: Linux 2.4.18
> File System: xfs-1.1
> Samba version: 3.0-alpha19
> Network: 1 GB point to point
>
> Client:
> 1/2 GB memory and 1.6 GHZ Pentium
>
> 1.1 Introduction:
> -----------------
>
> We have been trying to measure samba performance. The
> following are our observations.
>
> 1.2 Is it samba ?
> -----------------
> We wanted to find out for sure whether samba was the
> bottleneck.
> So we did the following experiment.
>
> 1. dbench (to measure disk TP)
> 2. tbench (to measure TCP/IP TP)
> 3. dbench+tbench:
> In this experiment we wanted to find out whether
> system, not
> samba was the limitation. For each number of clients
> dbench and
> tbench was stated simultaneously.
> 4. nbench with clients_oplocks.txt trace (to measure
> samba TP)
>
> The results are as follows
>
> Num dbench tbench dbench tbench min(1,2) nbench
> clients alone alone (simul (simul
> tbench) dbench)
> (1) (2)
> 1 77.152 20.915 77.1373 19.7312 19.7312
> 11.5006
> 4 106.174 40.6007 71.2576 33.9155 33.9155
> 19.3349
> 8 93.378 56.4977 63.2581 43.745 43.745
> 19.8468
> 12 81.908 60.8616 59.0883 43.675 43.675
> 19.2888
> 16 56.834 63.6999 52.1449 41.5259 41.5259
> 19.3474
> 20 63.398 64.9676 50.9493 41.776 41.776
> 19.1162
> 24 61.818 66.6186 50.223 41.8949 41.8949
> 18.9119
> 28 55.442 67.3411 49.1058 41.5549 41.5549
> 19.0702
> 32 54.318 69.2981 47.8511 41.9139 41.9139
> 18.8018
> 36 54.986 70.1524 45.6686 41.3715 41.3715
> 18.3617
> 40 46.994 70.8444 45.2621 41.459 41.459
> 18.2381
> 44 41.702 69.8389 42.6287 41.0206 41.0206
> 18.1785
> 48 45.988 69.8389 40.4743 40.3336 40.3336
> 18.1683
>
> The nbench experiment measures samba performance with
> the same work load trace used for other experiments.
> As can be seen nbench TP is much smaller than minimum
> of
> (1) and (2) which implies that samba is the
> performance
> bottleneck. (The disk configuration for the above
> experiment was a 11 drive RAID 5 with LVM)
>
> 1.3 Where in Samba and what is the limitation ?:
> ------------------------------------------------
>
> We observe that our system is severely CPU limited.
> Here is the summary of top -d 1 trace of CPU usage
> during
> the period 16 nbench clients were active.(2 drive RAID
> 0 + LVM)
>
> User System Total
> Mean 34.60447761 64.14477612 98.74925373
> Median 35.2 63.7 99.9
> Stdev 0.070189292 0.076303659 0.06342686
>
> So it seems that more CPU time is spent in the system.
> Is this compatible with what we saw in earlier Samba
> versions ?
>
> Then we used the Samba build in profiling facility to
> get
> some information about performance intensive code
> paths.
> We discovered that the time spent on stat calls was
> excessive.
> The time was more than the time spent on read or write
> calls!
>
> Here are the time consuming system calls
> Name num calls time(us) Min(us) Max(us)
> ----- -------- ------- ------ ------
> syscall_opendir 189841 36913656 0 396806
> syscall_readdir 2329741 40225042 0 312880
> syscall_open 194256 150164226 0
> 1245872
> syscall_close 133504 41983747 0 475361
> syscall_read 320496 88093084 0 350440
> syscall_write 149776 90665926 0 382059
> syscall_stat 1335959 145079345 0 336839
> syscall_unlink 33520 101113573 0
> 1132776
>
> Here are the time consuming Trans2 calls
>
> Trans2_findfirst 57184 201725472 0
> 430785
> Trans2_qpathinfo 147536 255836025 0
> 412576
>
> and the time consuming SMB calls
> SMBntcreateX 175984 95263531 0 346844
> SMBdskattr 27344 63275572 0 351798
> SMBreadX 320496 90593419 0 350444
> SMBwriteX 149776 92584721 0 382067
> SMBunlink 33520 101522665 0
> 1132787
> SMBclose 133696 66140491 0 475414
>
> and cache statistics are
>
>
> ************************ Statcache
> *******************************
> lookups: 398768
> misses: 41
> hits: 398727
> ************************ Writecache
> ******************************
> read_hits: 0
> abutted_writes: 0
> total_writes: 149776
> non_oplock_writes: 149776
> direct_writes: 149776
> init_writes: 0
> flushed_writes[SEEK]: 0
> flushed_writes[READ]: 0
> flushed_writes[WRITE]: 0
> flushed_writes[READRAW]: 0
> flushed_writes[OPLOCK_RELEASE]: 0
> flushed_writes[CLOSE]: 0
> flushed_writes[SYNC]: 0
> flushed_writes[SIZECHANGE]: 0
> num_perfect_writes: 0
> num_write_caches: 0
> allocated_write_caches: 0
>
> For the above experiment <16 clients nbench 2 Dr RAID
> 0 + LVM> I am
> getting about ~21 MBytes/s.
>
> Then we removed the FIND_FIRST and
> QUERY_PATH_INFORMATION calls from
> the clients_oplocks.txt file. We can see that
> performance improves
> about 6-8 MBytes/s for 16 clients.
>
> Name num calls time(us) Min(us) Max(us)
> ----- -------- ------- ------ ------
> syscall_opendir 83009 18155570 0 306736
> syscall_readdir 938078 15806346 0 314394
> syscall_open 194256 163721233 0
> 1682098
> syscall_close 133504 50548558 0 905587
> syscall_read 320496 91373880 0 319341
> syscall_write 149776 94024793 0 345850
> syscall_stat 597492 69316075 0 312443
> syscall_unlink 33520 101812395 0
> 1369880
>
>
> As can be seen there is a substantial reduction in
> stat,readdir and
> opendir system call times.However the CPU user and
> system time
> distribution
> is identical to the previous case.
>
> To dissect the impact of stat we measured the kernel
> dcache hit/miss
> statistics. We see that there is a very high hit rate
> at the dcache.
> shrink_dcache_memory was not called indicating that
> the kernel mm did
> not run short of pages.
>
> To analyze the FIND_FIRST operation we put further
> traces in
> call_trans2findfirst call path. We realized that more
> than 60%
> of the time is spent in get_lanman2_dir_entry() call.
> And inside
> get_lanman2_dir_entry call we realized that majority
> of the time
> is spent inside vfs_stat call ~(46%) and ~28% of the
> time is spent
> in mask_match and exact_match calls.
>
> We did a kernel profiling of a 60 client netbench run
> and found
> out that link_path_walk,d_lookup,kmem_cache_alloc are
> visited more often when the timer interrupt occurs.
> All in sys_stat call path.
>
>
> Conclusion:
> -----------
> We think Samba needs to optimize caching of the stat
> calls. Individual
> stat
> calls (average = 49us) are not the concern, but the
> sheer number of
> stat calls are. Also significant BW can be gained by
> optimizing opendir
> and readdir calls (dir stream).
>
> Has anybody done this sort of profiling before ?
> Are these results compatible ?
> Are there any ongoing attempts to cache stat
> information ?
>
> Some insights in this regard is much appreciated.
>
> I am hoping to track down why open call is so
> expensive in a future exercise.
>
>
> Thank you
> Ravi
>
>
> =====
> ------------------------------
> Ravi Wijayaratne
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
> http://platinum.yahoo.com
>
More information about the samba-technical
mailing list