why is samba so slow with many files in one directory? [LARGE MESSAGE]

David Collier-Brown davecb at canada.sun.com
Wed Mar 8 20:47:22 GMT 2000


Hubert Grünheidt wrote:
> Maybe it'll help to be more precise:
> We have currently 14 Mio files separated into 140 Directories, each
> containing 100000 files. The naming-scheme is simple: <id>.<extension>; so
> directory 00000001 contains files 0.<someext> to 99999.<someext>, directory
> 00000002 contains files 100000.<someext> to 199999.<someext> etc.
> The extensions are different, but all files are unique in their
> number, the extensions are only used to indicate the type of the file.

	Cool: you can already split these by number-pattern.
	I'd try to make the directory
		- small enough for good scan-performance
		- large enough that clients tend to sit
		  in the same directory for reasonable periods
	The latter assumes that there is some kind of locality	
	of reference in the use of these files.

> Since the files have an average size of 11kB we wanted to try ReiserFS
> and Samba to deliver the files to Windows NT Clients (An
> NTFS-checkdisk currently lasts 8 hours on our RAID-System with NTFS
> and a Journaling Filesystem like ReiserFS, which is especially fast at
> small files, seemed very attractive to us).

	Ok, sounds like a good plan.

> I tried it at home last week (had a little time, while fighting
> influenza) with ext2-filesystem and some thousand files but the
> results were discouraging.
> 
> When I start *top* I can see that Linux uses nearly 100% CPU an 9x%
> are from SMBD, so the filesystem seems not to be the problem but the
> SMB daemon.
	
	Yes, creation is going to be a pretty cpu-bound operation:
	I'll bet you see high numbers for time spent in wait-io and 
	system state, the rest in user.
> 
> Samba is configured to be case-sensitive, security: per share,
> preferred master and local master, no wins support and allow any
> hosts.
> The share is configured to be read-write, guest OK, case-sensitive,
> default case=lower, mangle case = no and browsable.
> The user from Windows NT is known to Samba and the smbpasswd file.


	I was going to watch samba under truss, but I just broke
	2.0.7 alpha 1... back to 2.0.6!

	Ok, I just ran truss on smbd, while issuing an ls command to
	smbclient (on Solaris). It said:
 3.7258 open64("./", O_RDONLY|O_NDELAY)                 = 9
 3.7261 fcntl(9, F_SETFD, 0x00000001)                   = 0
 3.7263 fstat64(9, 0xFFBEE698)                          = 0
 3.7266 getdents64(9, 0x00153B50, 1048)                 = 1040
 3.7269 getdents64(9, 0x00153B50, 1048)                 = 1040
 3.7272 getdents64(9, 0x00153B50, 1048)                 = 1024
 3.7275 getdents64(9, 0x00153B50, 1048)                 = 1048
 3.7278 getdents64(9, 0x00153B50, 1048)                 = 1032
 3.7281 getdents64(9, 0x00153B50, 1048)                 = 912
 3.7284 getdents64(9, 0x00153B50, 1048)                 = 0
 3.7286 close(9)                                        = 0

	a normal scan, which took .0028 seconds,
	followed by some log writing (which took
	about .0042 seconds per line!)

	This was then followed by stats for the ls (which
	is something of an ls -l) which took .2799
	seconds, ~96 times the getdents time, and a statvfs
	to get the disk space free  

4.0985 statvfs64(".", 0xFFBEEFD0)     

	There were 184 entries
	in the directory I used, 6096 bytes, and that gives
	a readdir speed of about 1.4 MB/S or 44 K-entries/S 
	for  a small directory.  Big ones get slower as
	a function of indirect blocks used, so there
	will be a step-function in the speed you'll want to
	stay below.


The logs said
[2000/03/08 11:04:55, 3] smbd/process.c:process_smb(615)
  Transaction 32 of length 87
[2000/03/08 11:04:55, 3] smbd/process.c:switch_message(448)
  switch message SMBtrans2 (pid 8826)
[2000/03/08 11:04:55, 3] smbd/trans2.c:call_trans2findfirst(669)
  call_trans2findfirst: dirtype = 22, maxentries = 512,
close_after_first=0, clo
se_if_end = 1 requires_resume_key = 1 level = 260, max_data_bytes =
65535
[2000/03/08 11:04:55, 3] lib/util.c:unix_clean_name(608)
  unix_clean_name [/*]
[2000/03/08 11:04:55, 3] lib/util.c:unix_clean_name(608)
  unix_clean_name [*]
[2000/03/08 11:04:55, 3] lib/util.c:unix_clean_name(608)
  unix_clean_name [./]
[2000/03/08 11:04:55, 3] smbd/dir.c:dptr_create(491)
  creating new dirptr 256 for path ./, expect_close = 1

	...which is the message seen in truss, above. 
	This is  followed by

[2000/03/08 11:04:55, 3] smbd/process.c:process_smb(615)
  Transaction 33 of length 39
[2000/03/08 11:04:55, 3] smbd/process.c:switch_message(448)
  switch message SMBdskattr (pid 8826)
[2000/03/08 11:04:55, 3] smbd/reply.c:reply_dskattr(1199)
  dskattr dfree=343

	Which is the by a disk-space-free request
	for the ls.

	
	To me, this says the simple directory scan is 
	fairly "light" at the system level, and most of
	the cycles get used by the app.

	Sar says:

SunOS elsbeth 5.8 Generic sun4u    03/08/00

12:56:32    %usr    %sys    %wio   %idle
12:56:33       2       9       9      80
12:56:34       5       3       1      91
12:56:35       0       0       0     100
12:56:36       0       0       0     100

	(Yes, I was testing on Solaris 8 at work (;-))

	The open and first readdir caused wait-io,
	the rest grabbed data from a buffer and the
	cpu processing jumped up.

	Let's try this with a 100,000-file directory,
	created on my local disk, that should be slow!
	(the creation is taking ages, in fact! I think
	we'll stop at 85,784 files)


# sar -o foo.raw 1 120
SunOS elsbeth 5.8 Generic sun4u    03/08/00

15:23:33    %usr    %sys    %wio   %idle
15:23:34      23      77       0       0
15:23:35      28      72       0       0
15:23:36      29      71       0       0
15:23:37      34      66       0       0
15:23:38      28      72       0       0

	yes, the user time jumps up, and the
	system time too as the data is transferred
	to the client.

	Looking at it in detail, the cpu was 20% usr 
	for the first 30 seconds, then jumped to
	80 % as the client, running on the same machine,
	started formatting and printing. The system
	time started at 80%, and dropped to 30% after
	the transfer completed.

	This is attached as a gif file: dir.cpu.gif

	ly other interest sting graph was logical and
	physical reads: this is attached as dir.read.gif,
	and the physical reads were remarkably low, as
	the disk and cache seems to make them "instantaneous".
	That tends to imply that the OS is mostly walking 
	buffer pages and transferring data to the app.

	[I'll send Herr Grünheidt a more detailed set of plots]


	So we need to do both: minimize samba processing, and 
	organize the filesystem for fast directory traversal.
	The latter is a multiplier on both slow directory 
	scans in Unix and and samba's processing, so reorganizing
	will give the biggest single payoff.

--dave
-- 
David Collier-Brown,  | Always do right. This will gratify some people
185 Ellerslie Ave.,   | and astonish the rest.        -- Mark Twain
Willowdale, Ontario   | //www.oreilly.com/catalog/samba/author.html
Work: (905) 415-2849 Home: (416) 223-8968 Email: davecb at canada.sun.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dir.cpu.gif
Type: image/gif
Size: 4088 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba/attachments/20000308/fd0140e3/dir.cpu.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dir.read.gif
Type: image/gif
Size: 2922 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba/attachments/20000308/fd0140e3/dir.read.gif


More information about the samba mailing list