[Samba] Any LARGE production Sambas?

John H Terpstra jht at samba.org
Tue Jun 3 19:23:52 GMT 2003


On Tue, 3 Jun 2003, Roylance, Stephen D. wrote:

> > -----Original Message-----
> > From: John H Terpstra [mailto:jht at samba.org]
> > Sent: Tuesday, June 03, 2003 1:53 PM
> > To: Michael MacIsaac
> > Cc: samba at lists.samba.org
> > Subject: Re: [Samba] Any LARGE production Sambas?
> >
> > The biggest single impact on performance is the number of files in a
> > directory. Large numbers of files in one directory will slow
> > samba down to
> > a crawl. I am unaware of ANY impact of file system size on
> > performance,
> > other than number of directory entries.
>
> Is that because the underlying OS is slow operating in large directories, or
> because of Samba overhead?
> How many is 'Large numbers of'?  100's, 1000's, millions?

The key factor is the fact that the DOS (FAT,VFAT,VFAT32) and NTFS file
systems store case insensitive file names. All directory lookups are case
insensitive within the OS. Unix file systems are case sensitive. That
means that Samba has to do it's own processing to affect case insensitive
file name handling. Additionally, Samba does all the long file name to
short (8.3) name mangling. This adds a huge overhead to directory lookups
that NT is NOT burdened by. NT does it's file name handling in kernel
space, samba has to do all of this in user space, again with higher
overhead.

The effect of this is significant when you have many thousands of files in
one directory. The effect can be overcome by efficient use of directories
and by choosing a file system type that has greater affinity for the type
of directory operations that Samba performs.

Having said this, the underlying file storage overhead significantly
affects performance also. In benchmarks that have been published it has
been clearly shown that ext2fs is much faster than reiserfs, and resierfs
is much faster than ext3fs when running the NetBench test suite against a
Linux Samba server. So not read into this too much, as it is critical to
properly define test conditions and methodologies and to make certain that
these truely reflect the type of service that the real world application
must meet.

The bottome line is that directory lookups slow to a crawl as the number
of directory entries increases.

- John T.
-- 
John H Terpstra
Email: jht at samba.org



More information about the samba mailing list