[jcifs] too much timeout when multithreaded

Mon Apr 7 04:43:03 EST 2003

On Sun, 6 Apr 2003 18:55:04 +0200
"Benoit Daviaud" <benda472 at student.liu.se> wrote:

> Hello,
> 
> I have a program that is harvesting on a local network all the files
<snip> 
> Then I've been experimenting on what happend for different maximum
number of threads. It seems to work well for 10 or 20 threads. But if
I try with 100 threads, I have a very high number of timeout. In fact

Try the ThreadedSmbCrawler. Make sure you use the one from the latest
release that uses listFiles instead of just list. Use that with the
same parameters as your crawler and look at the difference. Make sure
your resolveOrder property does not have BCAST in it. I don't know off
hand what the problem could be. Is it possible that all 10-20 threads
are crawling over the same server at any one time? That's going to be
significantly slower than having one thread crawling over each host. Also,
make sure your algorithm does not build up a big list of stale SmbFiles;
keep the working-list small. The ThreadedSmbCrawler does this. It is
normal to receive many exceptions when doing this sort of thing. But if
you investigate any one host the issue should be reproducable because the
host is down or in a bad state or the IP returned by WINS is not routable
...etc. Finally, you have to take care when writing a crawler. It's
not trivial. A simple mistake can make it *very* slow. The algorithm is
critical to performance due to the latency of querying each node.

Mike

-- 
A  program should be written to model the concepts of the task it
performs rather than the physical world or a process because this
maximizes  the  potential  for it to be applied to tasks that are
conceptually  similar and, more important, to tasks that have not
yet been conceived.