[jcifs] BFS vs DFS
Allen, Michael B (RSCH)
Michael_B_Allen at ml.com
Fri Jul 27 08:26:59 EST 2001
> -----Original Message-----
> From: Torgny Johansson [SMTP:torgny.johansson at kommun.ljungby.se]
>
> What is the best thing to do then?
> I've written a crawler that just lists all the computers in the workgroups
> from top to bottom (currently not threaded) and it takes very long time to
> do a full "crawl". About 11 hours for 430 pcs (far from every pc have
> shares) and that seems all too long. My code probably (read most
> certainly...) not optimized, so briefly; which is the way to go to create an
> efficient crawler?
>
I think you should try the crawler example and modify that as you need. I
don't think it's possible that it take 11 hours to crawl over 400 PCs unless
it digs down into many subdirectories. There could be some sort of timeout
occurng. The ThreadedSmbCrawler example does ok -- I have seen it
enumerate every share on 200+ hosts in about 10-15 minutes. This does
not include files or directories within those shares of course. Doing operations
in parallel (meaning using threads) will speed things up quite a bit. Similarly,
choosing a proper traversal algorithm could make a dramatic difference. You
should really look at traces of whats happening. Think of it this way -- if you
can spew 100+ packets per second and you can confirm that each is doing
something worthy then you're doing ok. SMB is a pretty verbose protocol.
There are quite a few optimisations that can be performed but you have to be
able to understand the network traces.
As for BFS vs. DFS I don't know what the best algorithm would be. You want
to keep the active list of URLs to search small and related hierarchially so
that your not openning dozens of connections to different hosts and don't
maintain a big list of Strings in memory. I think DFS would be better but I
don't really know for sure. I'll leave that to the mp3 seekers :~)
Mike
More information about the jcifs
mailing list