Slow performance on large dirs.

Michel Stoop michel at mcs.xs4all.nl
Fri Oct 29 15:12:50 GMT 1999


Hello David,

Thanks for your reply.

> Michel Stoop wrote:
> > I have run into a problem with samba performing very
> > slow on directories with many files (>5000 files).
> > Opening a file in notepad takes about 9 seconds!

First some more info.
The directory contains files named 901txxxx.gd Where xxxx
is a number. They are about 2-6kb in size.
The program creates a filelist and then opens a file, read
the contens and then closes the file, then the next until
the last file.

> 	I usually reccomend transforming names so
> 	that 19990312.636328 turns into 19/99/03/12/63/63/28
> 	or the like... If a program is accessing the files,
> 	this is trivial.  It's not suitable for humans, though,
> 	who get tired of clicking...

What is the bennefit of changing these names?

> > I have put some code in trans2.c after line 773 and 1030
> > to set finished to true if the found filename is an exact
> > match to the given mask. This speeds things up but it is
> > not a good fix. Altough I try to understand why NT first
> > tries to determine if the file exists before opening it,

The code is:
      finished = !get_lanman2_dir_entry(conn,mask,dirtype,info_level,
                   requires_resume_key,dont_descend,
                   &p,pdata,space_remaining, &out_of_space,
                   &last_name_off);
    }

    if ((!strcmp(mask, wcard)) && (!strchr(mask, '*'))) [added this one!]
      finished = True;

    if (finished && out_of_space)
      finished = False;

This extra code aborts the dirseek if mask contains an exact
filename. This is because I think that there is no sense to
continue searching if we were looking for a single filename.

Now the delay is almost gone when I open the first file in
the directory, but it gets slower when the file wich I open
is further in the directory list. It has to parse more filenames...

> 	If you're seeing lots of repeated acccesses to the same
> 	file, it's usually helpfull to cache the last filename
> 	seen.  A single-element cache, in such a case, will
> 	make a slow algorithm appear twice as fast (ie, the
> 	first access will remain slow, but the second will fly).

No, it is more the things that happens before the file is opened.

This is what I see:
First the NT client calls findfirst to check if the filename
exists. Then findfirst is again called to seek to the dirpointer
and then the file is opened. What I think is the problem is
that the many string comparing routines in the code are causing
excessive delays. As I see from the log that most time is lost
in the get_lanman2_dir_entry call.

> 	This works well when the temporal adjacency is high, which
> 	appears to be true in your case.
> 
> > I have to come up with a solution to this problem this
> > week or we have to abandon samba and come up with another
> > solution to access files on our D380 server.
> 
> 	Eek! 
> 	What else can we do to help?

I really hope to find a solution because I do not want to
abandon samba! We have it running on about 35 HP-UX systems.
I'm also working on this problem at home, and I have been given
more time (a week) to come up with a solution.
 
> David Collier-Brown,  | Always do right. This will gratify some people
> 185 Ellerslie Ave.,   | and astonish the rest.        -- Mark Twain
> Willowdale, Ontario   | http://java.science.yorku.ca/~davecb
> Work: (905) 415-2849 Home: (416) 223-8968 Email: davecb at canada.sun.com

---
Michel Stoop,

Senior Network and Systems Administrator
postmaster for ncg.nl and vuykgron.nl
Numeriek Centrum Groningen B.V.
Vuyk Engineering Centre Groningen B.V.
Postbus 204
9700AE Groningen
+31 (0)50 541 26 32  fax: +31 (0)50 542 37 17
http://www.ncg.nl
mailto:stoop at ncg.nl
---




More information about the samba-technical mailing list