[jcifs] SMB Crawler Guidelines

Michael B.Allen Michael_B_Allen at ml.com
Wed Apr 9 09:08:35 EST 2003


On Tue, 08 Apr 2003 17:05:27 -0400
"Dan Dumont" <Dan at canofsleep.com> wrote:

> > This is probably 2 or 3 but may be as high
> > as 5 possibly. Now add threads until your throughput no longer
> > increases.
> 
> Is there a way to test this in the program or is this an observation you
> were intending we make while we are deciding how many threads to use.
> 
> 
> In this case, I was thinking that a large number of threads would be
> preferable, as the database of ip's to scan can have a great many inactive
> computers.  And the network io is very slow.
> 
> However, you mentioned that a thread is spawned for each smbTransport, but
> you said that thread creation is expensive.  Did you somehow make a
> workaround for this or is this a cost that we must live with?

The answer to all of the above is largely the same; there's a mid-point
that will give you optimal performance. If you have a lot of unresponsive
hosts you might increase the number of threads but too many threads and
things will get slower. There's nothing you can do about the 'Connection
timeout' exceptions. That's what happens when there is no response to
the initial SYN packet and it takes ~1min 15seconds to timeout.

> Also.. I think I understand what you meant, you said that about 5 threads
> per host, but how many hosts do you think we should spawn at a time?  There
> is a larger probability for waiting for a host since the host may be
> inactive..    so...    does a 20-30 parent threads each having 3-5 per host
> indexing threads sound reasonable?

Again, you would have to do some basic analysis. Just make them parameters
and adjust the numbers with each trial. I don't know what the optimal
parameters would be nor could I know.

Mike




More information about the jcifs mailing list