[jcifs] BFS vs DFS

Fri Jul 27 08:26:59 EST 2001

> -----Original Message-----
> From:	Torgny Johansson [SMTP:torgny.johansson at kommun.ljungby.se]
> 
> What is the best thing to do then?
> I've written a crawler that just lists all the computers in the workgroups
> from top to bottom (currently not threaded) and it takes very long time to
> do a full "crawl". About 11 hours for 430 pcs (far from every pc have
> shares) and that seems all too long. My code probably (read most
> certainly...) not optimized, so briefly; which is the way to go to create an
> efficient crawler?
> 
	I think you should try the crawler example and modify that as you need. I
	don't think it's possible that it take 11 hours to crawl over 400 PCs unless
	it digs down into many subdirectories. There could be some sort of timeout
	occurng. The ThreadedSmbCrawler example does ok -- I have seen it
	enumerate every share on 200+ hosts in about 10-15 minutes. This does
	not include files or directories within those shares of course. Doing operations
	in parallel (meaning using threads) will speed things up quite a bit. Similarly,
	choosing a proper traversal algorithm could make a dramatic difference. You
	should really look at traces of whats happening. Think of it this way -- if you
	can spew 100+ packets per second and you can confirm that each is doing
	something worthy then you're doing ok. SMB is a pretty verbose protocol.
	There are quite a few optimisations that can be performed but you have to be
	able to understand the network traces.

	As for BFS vs. DFS I don't know what the best algorithm would be. You want
	to keep the active list of URLs to search small and related hierarchially so
	that your not openning dozens of connections to different hosts and don't
	maintain a big list of Strings in memory. I think DFS would be better but I
	don't really know for sure. I'll leave that to the mp3 seekers :~)

	Mike