[Samba] fast parallel crawling of file systems

Di Pe dipeit at gmail.com
Fri Nov 16 17:27:40 MST 2012

Hi, I use a disk space inventory tool called TreeSizePro to scan file
filesystems on windows and linux boxes. On Linux systems I export
these shares via samba to scan them. TreeSizePro is multi-threaded (32
crawlers) and I run it on windows 7. I am scanning file systems that
are local to the linux servers and also nfs mounts that are
re-exported via samba.

If I scan a windows 2008 server I can scan many millon files in about
1 hour, If I do the same thing with samba it takes more than 1 day. It
takes longer to scan to re-exported nfs share than the local share but
not a whole lot so I must assume the bottleneck lies within samba. (I
can also crawl the nfs mount really fast) How can I make samba fly?
How can I improve this metadata performance? I don't care about
stability, I just want to maximize performance. We don't have a slow
or a badly configured network.

I compiled samba 3.6.9 on a plain ubuntu 12.04 box and searched the
web for some performance improvement suggestions, not sure if the
config below makes any sense.

./configure --with-aio-support --enable-pthreadpool
--prefix=/opt/samba --with-ads

/root # cat /opt/samba/lib/smb.conf
workgroup = FH
netbios name = copycat
min protocol = SMB2
max protocol = SMB2
#log level = 1

# performance enhancements
strict locking = no
max xmit = 65535
deadtime = 15

path = /tmp
read only = no

path = /shared
read only = yes
follow symlinks = no
wide links = no

