Slow performance on large dirs.

Anders C. Thorsen anders at aae.wisc.edu
Fri Oct 29 16:33:15 GMT 1999


-----Original Message-----
From: Michel Stoop <michel at mcs.xs4all.nl>
To: Multiple recipients of list SAMBA-TECHNICAL <samba-technical at samba.org>
Date: Friday, October 29, 1999 10:27 AM
Subject: RE: Slow performance on large dirs.


>Hello David,
>
>Thanks for your reply.
>
>> Michel Stoop wrote:
>> > I have run into a problem with samba performing very
>> > slow on directories with many files (>5000 files).
>> > Opening a file in notepad takes about 9 seconds!
>
>First some more info.
>The directory contains files named 901txxxx.gd Where xxxx
>is a number. They are about 2-6kb in size.
>The program creates a filelist and then opens a file, read
>the contens and then closes the file, then the next until
>the last file.
>
>> I usually reccomend transforming names so
>> that 19990312.636328 turns into 19/99/03/12/63/63/28
>> or the like... If a program is accessing the files,
>> this is trivial.  It's not suitable for humans, though,
>> who get tired of clicking...
>
>What is the bennefit of changing these names?

Because samba will not need to list 5000 in the same directory...
By having.. say 100 files per directory, this search will go FASTER bacause
per file you open, samba will only list 100 files not 5000 files... which
means that each filesearch
will go approx. 50 times faster...

>
>> > I have put some code in trans2.c after line 773 and 1030
>> > to set finished to true if the found filename is an exact
>> > match to the given mask. This speeds things up but it is
>> > not a good fix. Altough I try to understand why NT first
>> > tries to determine if the file exists before opening it,
>
>The code is:
>      finished = !get_lanman2_dir_entry(conn,mask,dirtype,info_level,
>                   requires_resume_key,dont_descend,
>                   &p,pdata,space_remaining, &out_of_space,
>                   &last_name_off);
>    }
>
>    if ((!strcmp(mask, wcard)) && (!strchr(mask, '*'))) [added this one!]
>      finished = True;
>
>    if (finished && out_of_space)
>      finished = False;
>
>This extra code aborts the dirseek if mask contains an exact
>filename. This is because I think that there is no sense to
>continue searching if we were looking for a single filename.
>
>Now the delay is almost gone when I open the first file in
>the directory, but it gets slower when the file wich I open
>is further in the directory list. It has to parse more filenames...
>
>> If you're seeing lots of repeated acccesses to the same
>> file, it's usually helpfull to cache the last filename
>> seen.  A single-element cache, in such a case, will
>> make a slow algorithm appear twice as fast (ie, the
>> first access will remain slow, but the second will fly).
>
>No, it is more the things that happens before the file is opened.
>
>This is what I see:
>First the NT client calls findfirst to check if the filename
>exists. Then findfirst is again called to seek to the dirpointer
>and then the file is opened. What I think is the problem is
>that the many string comparing routines in the code are causing
>excessive delays. As I see from the log that most time is lost
>in the get_lanman2_dir_entry call.
>
>> This works well when the temporal adjacency is high, which
>> appears to be true in your case.
>>
>> > I have to come up with a solution to this problem this
>> > week or we have to abandon samba and come up with another
>> > solution to access files on our D380 server.
>>
>> Eek!
>> What else can we do to help?
>
>I really hope to find a solution because I do not want to
>abandon samba! We have it running on about 35 HP-UX systems.
>I'm also working on this problem at home, and I have been given
>more time (a week) to come up with a solution.
>
>> David Collier-Brown,  | Always do right. This will gratify some people
>> 185 Ellerslie Ave.,   | and astonish the rest.        -- Mark Twain
>> Willowdale, Ontario   | http://java.science.yorku.ca/~davecb
>> Work: (905) 415-2849 Home: (416) 223-8968 Email: davecb at canada.sun.com
>
>---
>Michel Stoop,
>
>Senior Network and Systems Administrator
>postmaster for ncg.nl and vuykgron.nl
>Numeriek Centrum Groningen B.V.
>Vuyk Engineering Centre Groningen B.V.
>Postbus 204
>9700AE Groningen
>+31 (0)50 541 26 32  fax: +31 (0)50 542 37 17
>http://www.ncg.nl
>mailto:stoop at ncg.nl
>---
>
>
>

--- Anders



More information about the samba-technical mailing list