Samba performance
Ravi Wijayaratne
ravi_wija at yahoo.com
Mon Mar 31 23:18:13 GMT 2003
Jeremy,
I apologise for the format hassle. Hope this works.
Cheers
Ravi
> Please resend with a mailer that doesn't wrap at 80 columns :-).
>
> Jeremy.
Samba Performance testing
==========================
1.0 Architecture:
-----------------
Server:
CPU: Intel(R) Pentium(R) III CPU family 1266MHz
Memory: 1GB
Kernel: Linux 2.4.18
File System: xfs-1.1
Samba version: 3.0-alpha19
Network: 1 GB point to point
Client:
1/2 GB memory and 1.6 GHZ Pentium
1.1 Introduction:
-----------------
We have been trying to measure samba performance. The
following are our observations.
1.2 Is it samba ?
-----------------
We wanted to find out for sure whether samba was the
bottleneck.
So we did the following experiment.
1. dbench (to measure disk TP)
2. tbench (to measure TCP/IP TP)
3. dbench+tbench:
In this experiment we wanted to find out whether
system, not
samba was the limitation. For each number of clients
dbench and
tbench was stated simultaneously.
4. nbench with clients_oplocks.txt trace (to measure
samba TP)
The results are as follows
Num dbench tbench dbench tbench min(1,2) nbench
clis alone alone (simul (simul
tbench) dbench)
(1) (2)
1 77.152 20.915 77.1373 19.7312 19.7312 11.5006
4 106.174 40.6007 71.2576 33.9155 33.9155 19.3349
8 93.378 56.4977 63.2581 43.745 43.745 19.8468
12 81.908 60.8616 59.0883 43.675 43.675 19.2888
16 56.834 63.6999 52.1449 41.525 41.525 19.3474
20 63.398 64.967 50.9493 41.776 41.776 19.1162
24 61.818 66.6186 50.223 41.8949 41.8949 18.9119
28 55.442 67.3411 49.1058 41.5549 41.5549 19.0702
32 54.318 69.2981 47.8511 41.9139 41.9139 18.8018
36 54.986 70.1524 45.6686 41.3715 41.3715 18.3617
40 46.994 70.8444 45.2621 41.459 41.459 18.2381
44 41.702 69.8389 42.6287 41.0206 41.0206 18.1785
48 45.988 69.8389 40.4743 40.3336 40.3336 18.1683
The nbench experiment measures samba performance with
the same work load trace used for other experiments.
As can be seen nbench TP is much smaller than minimum
of
(1) and (2) which implies that samba is the
performance
bottleneck. (The disk configuration for the above
experiment was a 11 drive RAID 5 with LVM)
1.3 Where in Samba and what is the limitation ?:
------------------------------------------------
We observe that our system is severely CPU limited.
Here is the summary of top -d 1 trace of CPU usage
during
the period 16 nbench clients were active.(2 drive RAID
0 + LVM)
User System Total
Mean 34.60447761 64.14477612 98.74925373
Median 35.2 63.7 99.9
Stdev 0.070189292 0.076303659 0.06342686
So it seems that more CPU time is spent in the system.
Is this compatible with what we saw in earlier Samba
versions ?
Then we used the Samba build in profiling facility to
get
some information about performance intensive code
paths.
We discovered that the time spent on stat calls was
excessive.
The time was more than the time spent on read or write
calls!
Here are the time consuming system calls
Name num calls time(us) Min(us) Max(us)
----- -------- ------- ------ ------
syscall_opendir 189841 36913656 0 396806
syscall_readdir 2329741 40225042 0 312880
syscall_open 194256 150164226 0 1245872
syscall_close 133504 41983747 0 475361
syscall_read 320496 88093084 0 350440
syscall_write 149776 90665926 0 382059
syscall_stat 1335959 145079345 0 336839
syscall_unlink 33520 101113573 0 1132776
Here are the time consuming Trans2 calls
Trans2_findfirst 57184 201725472 0 430785
Trans2_qpathinfo 147536 255836025 0 412576
and the time consuming SMB calls
SMBntcreateX 175984 95263531 0 346844
SMBdskattr 27344 63275572 0 351798
SMBreadX 320496 90593419 0 350444
SMBwriteX 149776 92584721 0 382067
SMBunlink 33520 101522665 0 1132787
SMBclose 133696 66140491 0 475414
and cache statistics are
************************ Statcache
*******************************
lookups: 398768
misses: 41
hits: 398727
************************ Writecache
******************************
read_hits: 0
abutted_writes: 0
total_writes: 149776
non_oplock_writes: 149776
direct_writes: 149776
init_writes: 0
flushed_writes[SEEK]: 0
flushed_writes[READ]: 0
flushed_writes[WRITE]: 0
flushed_writes[READRAW]: 0
flushed_writes[OPLOCK_RELEASE]: 0
flushed_writes[CLOSE]: 0
flushed_writes[SYNC]: 0
flushed_writes[SIZECHANGE]: 0
num_perfect_writes: 0
num_write_caches: 0
allocated_write_caches: 0
For the above experiment <16 clients nbench 2 Dr
RAID0 + LVM> I am getting about ~21 MBytes/s.
Then we removed the FIND_FIRST and QUERY_PATH_INFORMATION
calls from the clients_oplocks.txt file. We can see that
performance improves about 6-8 MBytes/s for 16 clients.
Name num calls time(us) Min(us) Max(us)
----- -------- ------- ------ ------
syscall_opendir 83009 18155570 0 306736
syscall_readdir 938078 15806346 0 314394
syscall_open 194256 163721233 0 1682098
syscall_close 133504 50548558 0 905587
syscall_read 320496 91373880 0 319341
syscall_write 149776 94024793 0 345850
syscall_stat 597492 69316075 0 312443
syscall_unlink 33520 101812395 0 1369880
As can be seen there is a substantial reduction
in stat,readdir and opendir system call times.However
the CPU user and system time distribution is almost identical
to the previous case.
To dissect the impact of stat we measured the kernel
dcache hit/miss statistics. We see that there is a very
high hit rate at the dcache. shrink_dcache_memory was not
called indicating that the kernel mm did not run short of
pages.
To analyze the FIND_FIRST operation we put
further traces in call_trans2findfirst call path. We
realized that more than 60% of the time is spent in
get_lanman2_dir_entry() call. And inside get_lanman2_dir_entry
call we realized that majority of the time is spent inside vfs_stat
call ~(46%) and ~28% of the time is spent in mask_match and exact_match
calls.
We did a kernel profiling of a 60 client netbench run
and found out that link_path_walk,d_lookup,kmem_cache_alloc
are visited more often when the timer interrupt occurs.
All in sys_stat call path.
Conclusion:
-----------
We think Samba needs to optimize caching of the stat
calls. Individual stat calls (average = ~49us) are not the
concern, but the sheer number of stat calls are. Also significant
BW can be gained by optimizing opendir and readdir calls (dir
stream).
Has anybody done this sort of profiling before ?
Are these results compatible/make sense ?
Are there any ongoing attempts to
cache stat information in the user/kernel space?
Some insights in this regard is much appreciated.
I am hoping to track down why open call is so
expensive in a future exercise.
Thank you
Ravi
=====
------------------------------
Ravi Wijayaratne
=====
------------------------------
Ravi Wijayaratne
__________________________________________________
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com
More information about the samba-technical
mailing list