why is samba so slow with many files in one directory? [LARGE
MESSAGE]
David Collier-Brown
davecb at canada.sun.com
Wed Mar 8 20:47:22 GMT 2000
Hubert Grünheidt wrote:
> Maybe it'll help to be more precise:
> We have currently 14 Mio files separated into 140 Directories, each
> containing 100000 files. The naming-scheme is simple: <id>.<extension>; so
> directory 00000001 contains files 0.<someext> to 99999.<someext>, directory
> 00000002 contains files 100000.<someext> to 199999.<someext> etc.
> The extensions are different, but all files are unique in their
> number, the extensions are only used to indicate the type of the file.
Cool: you can already split these by number-pattern.
I'd try to make the directory
- small enough for good scan-performance
- large enough that clients tend to sit
in the same directory for reasonable periods
The latter assumes that there is some kind of locality
of reference in the use of these files.
> Since the files have an average size of 11kB we wanted to try ReiserFS
> and Samba to deliver the files to Windows NT Clients (An
> NTFS-checkdisk currently lasts 8 hours on our RAID-System with NTFS
> and a Journaling Filesystem like ReiserFS, which is especially fast at
> small files, seemed very attractive to us).
Ok, sounds like a good plan.
> I tried it at home last week (had a little time, while fighting
> influenza) with ext2-filesystem and some thousand files but the
> results were discouraging.
>
> When I start *top* I can see that Linux uses nearly 100% CPU an 9x%
> are from SMBD, so the filesystem seems not to be the problem but the
> SMB daemon.
Yes, creation is going to be a pretty cpu-bound operation:
I'll bet you see high numbers for time spent in wait-io and
system state, the rest in user.
>
> Samba is configured to be case-sensitive, security: per share,
> preferred master and local master, no wins support and allow any
> hosts.
> The share is configured to be read-write, guest OK, case-sensitive,
> default case=lower, mangle case = no and browsable.
> The user from Windows NT is known to Samba and the smbpasswd file.
I was going to watch samba under truss, but I just broke
2.0.7 alpha 1... back to 2.0.6!
Ok, I just ran truss on smbd, while issuing an ls command to
smbclient (on Solaris). It said:
3.7258 open64("./", O_RDONLY|O_NDELAY) = 9
3.7261 fcntl(9, F_SETFD, 0x00000001) = 0
3.7263 fstat64(9, 0xFFBEE698) = 0
3.7266 getdents64(9, 0x00153B50, 1048) = 1040
3.7269 getdents64(9, 0x00153B50, 1048) = 1040
3.7272 getdents64(9, 0x00153B50, 1048) = 1024
3.7275 getdents64(9, 0x00153B50, 1048) = 1048
3.7278 getdents64(9, 0x00153B50, 1048) = 1032
3.7281 getdents64(9, 0x00153B50, 1048) = 912
3.7284 getdents64(9, 0x00153B50, 1048) = 0
3.7286 close(9) = 0
a normal scan, which took .0028 seconds,
followed by some log writing (which took
about .0042 seconds per line!)
This was then followed by stats for the ls (which
is something of an ls -l) which took .2799
seconds, ~96 times the getdents time, and a statvfs
to get the disk space free
4.0985 statvfs64(".", 0xFFBEEFD0)
There were 184 entries
in the directory I used, 6096 bytes, and that gives
a readdir speed of about 1.4 MB/S or 44 K-entries/S
for a small directory. Big ones get slower as
a function of indirect blocks used, so there
will be a step-function in the speed you'll want to
stay below.
The logs said
[2000/03/08 11:04:55, 3] smbd/process.c:process_smb(615)
Transaction 32 of length 87
[2000/03/08 11:04:55, 3] smbd/process.c:switch_message(448)
switch message SMBtrans2 (pid 8826)
[2000/03/08 11:04:55, 3] smbd/trans2.c:call_trans2findfirst(669)
call_trans2findfirst: dirtype = 22, maxentries = 512,
close_after_first=0, clo
se_if_end = 1 requires_resume_key = 1 level = 260, max_data_bytes =
65535
[2000/03/08 11:04:55, 3] lib/util.c:unix_clean_name(608)
unix_clean_name [/*]
[2000/03/08 11:04:55, 3] lib/util.c:unix_clean_name(608)
unix_clean_name [*]
[2000/03/08 11:04:55, 3] lib/util.c:unix_clean_name(608)
unix_clean_name [./]
[2000/03/08 11:04:55, 3] smbd/dir.c:dptr_create(491)
creating new dirptr 256 for path ./, expect_close = 1
...which is the message seen in truss, above.
This is followed by
[2000/03/08 11:04:55, 3] smbd/process.c:process_smb(615)
Transaction 33 of length 39
[2000/03/08 11:04:55, 3] smbd/process.c:switch_message(448)
switch message SMBdskattr (pid 8826)
[2000/03/08 11:04:55, 3] smbd/reply.c:reply_dskattr(1199)
dskattr dfree=343
Which is the by a disk-space-free request
for the ls.
To me, this says the simple directory scan is
fairly "light" at the system level, and most of
the cycles get used by the app.
Sar says:
SunOS elsbeth 5.8 Generic sun4u 03/08/00
12:56:32 %usr %sys %wio %idle
12:56:33 2 9 9 80
12:56:34 5 3 1 91
12:56:35 0 0 0 100
12:56:36 0 0 0 100
(Yes, I was testing on Solaris 8 at work (;-))
The open and first readdir caused wait-io,
the rest grabbed data from a buffer and the
cpu processing jumped up.
Let's try this with a 100,000-file directory,
created on my local disk, that should be slow!
(the creation is taking ages, in fact! I think
we'll stop at 85,784 files)
# sar -o foo.raw 1 120
SunOS elsbeth 5.8 Generic sun4u 03/08/00
15:23:33 %usr %sys %wio %idle
15:23:34 23 77 0 0
15:23:35 28 72 0 0
15:23:36 29 71 0 0
15:23:37 34 66 0 0
15:23:38 28 72 0 0
yes, the user time jumps up, and the
system time too as the data is transferred
to the client.
Looking at it in detail, the cpu was 20% usr
for the first 30 seconds, then jumped to
80 % as the client, running on the same machine,
started formatting and printing. The system
time started at 80%, and dropped to 30% after
the transfer completed.
This is attached as a gif file: dir.cpu.gif
ly other interest sting graph was logical and
physical reads: this is attached as dir.read.gif,
and the physical reads were remarkably low, as
the disk and cache seems to make them "instantaneous".
That tends to imply that the OS is mostly walking
buffer pages and transferring data to the app.
[I'll send Herr Grünheidt a more detailed set of plots]
So we need to do both: minimize samba processing, and
organize the filesystem for fast directory traversal.
The latter is a multiplier on both slow directory
scans in Unix and and samba's processing, so reorganizing
will give the biggest single payoff.
--dave
--
David Collier-Brown, | Always do right. This will gratify some people
185 Ellerslie Ave., | and astonish the rest. -- Mark Twain
Willowdale, Ontario | //www.oreilly.com/catalog/samba/author.html
Work: (905) 415-2849 Home: (416) 223-8968 Email: davecb at canada.sun.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dir.cpu.gif
Type: image/gif
Size: 4088 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba/attachments/20000308/fd0140e3/dir.cpu.gif
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dir.read.gif
Type: image/gif
Size: 2922 bytes
Desc: not available
Url : http://lists.samba.org/archive/samba/attachments/20000308/fd0140e3/dir.read.gif
More information about the samba
mailing list