Slow performance on large dirs.
Michel Stoop
michel at mcs.xs4all.nl
Fri Oct 29 15:12:50 GMT 1999
Hello David,
Thanks for your reply.
> Michel Stoop wrote:
> > I have run into a problem with samba performing very
> > slow on directories with many files (>5000 files).
> > Opening a file in notepad takes about 9 seconds!
First some more info.
The directory contains files named 901txxxx.gd Where xxxx
is a number. They are about 2-6kb in size.
The program creates a filelist and then opens a file, read
the contens and then closes the file, then the next until
the last file.
> I usually reccomend transforming names so
> that 19990312.636328 turns into 19/99/03/12/63/63/28
> or the like... If a program is accessing the files,
> this is trivial. It's not suitable for humans, though,
> who get tired of clicking...
What is the bennefit of changing these names?
> > I have put some code in trans2.c after line 773 and 1030
> > to set finished to true if the found filename is an exact
> > match to the given mask. This speeds things up but it is
> > not a good fix. Altough I try to understand why NT first
> > tries to determine if the file exists before opening it,
The code is:
finished = !get_lanman2_dir_entry(conn,mask,dirtype,info_level,
requires_resume_key,dont_descend,
&p,pdata,space_remaining, &out_of_space,
&last_name_off);
}
if ((!strcmp(mask, wcard)) && (!strchr(mask, '*'))) [added this one!]
finished = True;
if (finished && out_of_space)
finished = False;
This extra code aborts the dirseek if mask contains an exact
filename. This is because I think that there is no sense to
continue searching if we were looking for a single filename.
Now the delay is almost gone when I open the first file in
the directory, but it gets slower when the file wich I open
is further in the directory list. It has to parse more filenames...
> If you're seeing lots of repeated acccesses to the same
> file, it's usually helpfull to cache the last filename
> seen. A single-element cache, in such a case, will
> make a slow algorithm appear twice as fast (ie, the
> first access will remain slow, but the second will fly).
No, it is more the things that happens before the file is opened.
This is what I see:
First the NT client calls findfirst to check if the filename
exists. Then findfirst is again called to seek to the dirpointer
and then the file is opened. What I think is the problem is
that the many string comparing routines in the code are causing
excessive delays. As I see from the log that most time is lost
in the get_lanman2_dir_entry call.
> This works well when the temporal adjacency is high, which
> appears to be true in your case.
>
> > I have to come up with a solution to this problem this
> > week or we have to abandon samba and come up with another
> > solution to access files on our D380 server.
>
> Eek!
> What else can we do to help?
I really hope to find a solution because I do not want to
abandon samba! We have it running on about 35 HP-UX systems.
I'm also working on this problem at home, and I have been given
more time (a week) to come up with a solution.
> David Collier-Brown, | Always do right. This will gratify some people
> 185 Ellerslie Ave., | and astonish the rest. -- Mark Twain
> Willowdale, Ontario | http://java.science.yorku.ca/~davecb
> Work: (905) 415-2849 Home: (416) 223-8968 Email: davecb at canada.sun.com
---
Michel Stoop,
Senior Network and Systems Administrator
postmaster for ncg.nl and vuykgron.nl
Numeriek Centrum Groningen B.V.
Vuyk Engineering Centre Groningen B.V.
Postbus 204
9700AE Groningen
+31 (0)50 541 26 32 fax: +31 (0)50 542 37 17
http://www.ncg.nl
mailto:stoop at ncg.nl
---
More information about the samba-technical
mailing list