Samba performance

Thu Apr 3 01:09:11 GMT 2003

This matches our profiling results. Samba is a CPU eater with stat calls
leading the charge. It is also very fond of consuming file descriptors like
crazy.

Matt Zinkevicius
Software Engineer
Network Storage Array Solutions
Hewlett-Packard

> -----Original Message-----
> From: Ravi Wijayaratne [mailto:ravi_wija at yahoo.com]
> Sent: Monday, March 31, 2003 3:35 PM
> To: samba-technical at lists.samba.org
> Subject: Samba performance
> 
> 
> Samba Performance testing 
> ==========================
> 
> 1.0 Architecture:
> ----------------- 
> Server:
> CPU: Intel(R) Pentium(R) III CPU family  1266MHz
> Memory: 1GB
> Kernel: Linux 2.4.18
> File System: xfs-1.1
> Samba version: 3.0-alpha19
> Network: 1 GB point to point
> 
> Client: 
> 1/2 GB memory and 1.6 GHZ Pentium  
> 
> 1.1 Introduction:
> -----------------
> 
> We have been trying to measure samba performance. The
> following are our observations.
> 
> 1.2 Is it samba ?
> -----------------
> We wanted to find out for sure whether samba was the
> bottleneck.
> So we did the following experiment. 
> 
> 1. dbench (to measure disk TP)
> 2. tbench (to measure TCP/IP TP)
> 3. dbench+tbench: 
>    In this experiment we wanted to find out whether
> system, not
> samba was the limitation. For each number of clients
> dbench and
> tbench was stated simultaneously. 
> 4. nbench with clients_oplocks.txt trace (to measure
> samba TP)
> 
> The results are as follows
> 
> Num	dbench	tbench	  dbench  tbench  min(1,2) nbench   
> clients alone    alone    (simul  (simul
> 			  tbench) dbench)
> 	                   (1)    (2)	
> 1       77.152  20.915  77.1373 19.7312 19.7312
> 11.5006
> 4       106.174 40.6007 71.2576 33.9155 33.9155
> 19.3349
> 8       93.378  56.4977 63.2581 43.745  43.745 
> 19.8468
> 12      81.908  60.8616 59.0883 43.675  43.675 
> 19.2888
> 16      56.834  63.6999 52.1449 41.5259 41.5259
> 19.3474
> 20      63.398  64.9676 50.9493 41.776  41.776 
> 19.1162
> 24      61.818  66.6186 50.223  41.8949 41.8949
> 18.9119
> 28      55.442  67.3411 49.1058 41.5549 41.5549
> 19.0702
> 32      54.318  69.2981 47.8511 41.9139 41.9139
> 18.8018
> 36      54.986  70.1524 45.6686 41.3715 41.3715
> 18.3617
> 40      46.994  70.8444 45.2621 41.459  41.459 
> 18.2381
> 44      41.702  69.8389 42.6287 41.0206 41.0206
> 18.1785
> 48      45.988  69.8389 40.4743 40.3336 40.3336
> 18.1683
> 
> The nbench experiment measures samba performance with
> the same work load trace used for other experiments. 
> As can be seen nbench TP is much smaller than minimum
> of
> (1) and (2) which implies that samba is the
> performance
> bottleneck. (The disk configuration for the above
> experiment was a 11 drive RAID 5 with LVM)
> 
> 1.3 Where in Samba and what is the limitation ?:
> ------------------------------------------------
> 
> We observe that our system is severely CPU limited.
> Here is the summary of  top -d 1 trace of CPU usage
> during 
> the period 16 nbench clients were active.(2 drive RAID
> 0 + LVM)
> 
>         User		System		Total
> Mean    34.60447761     64.14477612     98.74925373
> Median  35.2		63.7		99.9
> Stdev   0.070189292     0.076303659     0.06342686
> 
> So it seems that more CPU time is spent in the system.
> Is this compatible with what we saw in earlier Samba 
> versions ? 
> 
> Then we used the Samba build in profiling facility to
> get
> some information about performance intensive code
> paths.
> We discovered that the time spent on stat calls was
> excessive.
> The time was more than the time spent on read or write
> calls!
> 
> Here are the time consuming system calls
> Name		num calls time(us)	Min(us)	Max(us)
> -----		--------  -------	------	------
> syscall_opendir 189841  36913656        0       396806
> syscall_readdir 2329741 40225042        0       312880
> syscall_open    194256  150164226       0      
> 1245872
> syscall_close   133504  41983747        0       475361
> syscall_read    320496  88093084        0       350440
> syscall_write   149776  90665926        0       382059
> syscall_stat    1335959 145079345       0       336839
> syscall_unlink  33520   101113573       0      
> 1132776
> 
> Here are the time consuming Trans2 calls
> 
> Trans2_findfirst        57184   201725472       0     
>  430785
> Trans2_qpathinfo        147536  255836025       0     
>  412576
> 
> and the time consuming SMB calls
> SMBntcreateX    175984  95263531        0       346844
> SMBdskattr      27344   63275572        0       351798
> SMBreadX        320496  90593419        0       350444
> SMBwriteX       149776  92584721        0       382067
> SMBunlink       33520   101522665       0      
> 1132787
> SMBclose        133696  66140491        0       475414
> 
> and cache statistics are
> 
> 
> ************************ Statcache
> *******************************
> lookups:                        398768
> misses:                         41
> hits:                           398727
> ************************ Writecache
> ******************************
> read_hits:                      0
> abutted_writes:                 0
> total_writes:                   149776
> non_oplock_writes:              149776
> direct_writes:                  149776
> init_writes:                    0
> flushed_writes[SEEK]:           0
> flushed_writes[READ]:           0
> flushed_writes[WRITE]:          0
> flushed_writes[READRAW]:        0
> flushed_writes[OPLOCK_RELEASE]: 0
> flushed_writes[CLOSE]:          0
> flushed_writes[SYNC]:           0
> flushed_writes[SIZECHANGE]:     0
> num_perfect_writes:             0
> num_write_caches:               0
> allocated_write_caches:         0
> 
> For the above experiment <16 clients nbench 2 Dr RAID
> 0 + LVM> I am
> getting about ~21 MBytes/s.
> 
> Then we removed the FIND_FIRST and
> QUERY_PATH_INFORMATION calls from
> the clients_oplocks.txt file. We can see that
> performance improves
> about 6-8 MBytes/s for 16 clients.
> 
>  Name		num calls time(us)	Min(us)	Max(us)
> -----		--------  -------	------	------
> syscall_opendir 83009   18155570        0       306736
> syscall_readdir 938078  15806346        0       314394
> syscall_open    194256  163721233       0      
> 1682098
> syscall_close   133504  50548558        0       905587
> syscall_read    320496  91373880        0       319341
> syscall_write   149776  94024793        0       345850
> syscall_stat    597492  69316075        0       312443
> syscall_unlink  33520   101812395       0      
> 1369880
> 
> 
> As can be seen there is a substantial reduction in
> stat,readdir and
> opendir system call times.However the CPU user and
> system time 
> distribution
> is identical to the previous case.
> 
> To dissect the impact of stat we measured the kernel
> dcache hit/miss
> statistics. We see that there is a very high hit rate
> at the dcache.
> shrink_dcache_memory was not called indicating that
> the kernel mm did
> not run short of pages.
> 
> To analyze the FIND_FIRST operation we put further
> traces in 
> call_trans2findfirst call path. We realized that more
> than 60%
> of the time is spent in get_lanman2_dir_entry() call.
> And inside
> get_lanman2_dir_entry call we realized that majority
> of the time
> is spent inside vfs_stat call ~(46%) and ~28% of the
> time is spent
> in mask_match and exact_match calls.
> 
> We did a kernel profiling of a 60 client netbench run
> and found
> out that link_path_walk,d_lookup,kmem_cache_alloc are
> visited more often when the timer interrupt occurs.
> All in sys_stat call path.  
> 
> 
> Conclusion:
> -----------
> We think Samba needs to optimize caching of the stat
> calls. Individual 
> stat
> calls (average = 49us) are not the concern, but the
> sheer number of
> stat calls are. Also significant BW can be gained by
> optimizing opendir
> and readdir calls (dir stream).
> 
> Has anybody done this sort of profiling before ?
> Are these results compatible ?
> Are there any ongoing attempts to cache stat
> information ? 
> 
> Some insights in this regard is much appreciated.
> 
> I am hoping to track down why open call is so 
> expensive in a future exercise.
> 
>  
> Thank you
> Ravi
> 
> 
> =====
> ------------------------------
> Ravi Wijayaratne
> 
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
> http://platinum.yahoo.com
>