SV: SV: [jcifs] BFS vs DFS

Torgny Johansson torgny.johansson at kommun.ljungby.se
Fri Jul 27 15:47:45 EST 2001


Thanks, I'll start by multithreading it.
Currently the crawler dives down into subdirectories recursively, listing
all the files, but that's not what's taking time.
What's taking time is sometimes when there are no shares on a pc or if the
pc can't be found.

Also, the LAN I'm crawling is on many different subnets. I've set the wins,
but I have to set the baddr to the same adress as the wins to be able to
list all online pcs. Otherwise my crawler can't list any workgroups. I
suppose it was this you guys talked about in an earlier discussion and this
was being dealt with in the 0.6 ver, right?

Oh, by the way, thanks for a great API!

Torgny Johansson

-----Ursprungligt meddelande-----
Från: Allen, Michael B (RSCH) [mailto:Michael_B_Allen at ml.com]
Skickat: den 27 juli 2001 01:02
Till: 'Christopher R. Hertel'; Torgny Johansson
Kopia: jcifs at samba.org
Ämne: RE: SV: [jcifs] BFS vs DFS


Please note the public jCIFS API is far from ideal for an SmbCrawler. There
are a few advanced things that could be done to improve performance.

1) Do not use the public API but instead issue send( ServerMessageBlock
    req, ServerMessageBlock resp ) commands directly.
2) Cache information retrived from other operations (e.g. use Info data from
    findfirst/next operations instead of doing query info)
3) Use threads but only one per host.

These could improve performance by several orders of maganatude. It would
be faster than NT client by a lot.

Mike

> -----Original Message-----
> From:	Christopher R. Hertel [SMTP:crh at ubiqx.mn.org]
> Sent:	Thursday, July 26, 2001 12:09 PM
> To:	Torgny Johansson
> Cc:	jcifs at samba.org
> Subject:	Re: SV: [jcifs] BFS vs DFS
>
> Hmmm...
>
> I wonder what is taking so long.  It sounds as though some particular
> operation is just sitting there doing nothing while it waits for a
> timeout.  Multithreading the crawler might help.
>
> Chris -)-----
>
> On Thu, Jul 26, 2001 at 10:52:43AM +0200, Torgny Johansson wrote:
> > What is the best thing to do then?
> > I've written a crawler that just lists all the computers in the
workgroups
> > from top to bottom (currently not threaded) and it takes very long time
to
> > do a full "crawl". About 11 hours for 430 pcs (far from every pc have
> > shares) and that seems all too long. My code probably (read most
> > certainly...) not optimized, so briefly; which is the way to go to
create an
> > efficient crawler?
> >
> > Thanks
> > Torgny Johansson
> >
> > -----Ursprungligt meddelande-----
> > Från: jcifs-admin at lists.samba.org
> > [mailto:jcifs-admin at lists.samba.org]För Allen, Michael B (RSCH)
> > Skickat: den 26 juli 2001 03:29
> > Till: 'jcifs at samba.org'
> > Ämne: [jcifs] BFS vs DFS
> >
> >
> > I wrote:
> >
> > > try to minimize the size of your active list of URLs to
> > > search and therefore the number of URLs that might suddenly become
invalid
> > > by using a Breath First Search algorithm.
> >
> > This is not true. BFS would be awfull for an SmbCrawler.
> >
> > Mike
> >
> >
>
> --
> Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
> jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development,
uninq.
> ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
>





More information about the jcifs mailing list