[jcifs] indexing a file system using jcifs

Allen, Michael B (RSCH) Michael_B_Allen at ml.com
Fri Aug 9 09:17:32 EST 2002


> -----Original Message-----
> From:	Tait Larson [SMTP:larsonte at yahoo.com]
> Sent:	Thursday, August 08, 2002 2:11 PM
> To:	jcifs at lists.samba.org
> Subject:	[jcifs] indexing a file system using jcifs
> 
> The company I work for has a very large very
> unorganized windows based file server where it keeps
> most of it's important documents.
> 
	What floor are *you* on? :)

>   I've been
> considering writing a indexer/classifer to provider
> easier access to the documents.  I'm was hoping to use
> jcifs to feed all the docs in the shared file system
> to the indexer.
> 
> First off, I don't want to reinvent the wheel. Does
> anyone know if this type of solution already exists?
> 
	Does the examples/SmbCrawler.java example do what you want?.
	You might get a little more throughput from ThreadedSmbCrawler.
	The T2Crawler is the fastest but it decrements the depth
	argument too agressively and quits early (you can specify
	something like 1000 and get it to go over the entire filesystem
	though). Of course thes programs are just examples that spit out
	pathnames. They don't actually create any kind of "index".

> I need to be very careful not to crash the file system
> or engage too many of the file system resources at one
> time (think accidental DOS).  As long as I don't
> create too many symmultaneous connections to the file
> system this shouldn't be a problem, correct?  Does
> anyone have any other comments on my concerns?
> 
	If you run your indexer from one host there is no danger of
	"crashing" the server. JCIFS is faster than the C clients in some
	repects but it uses a lot of CPU when you do stuff like this. You
	probably wouldn't even make the server sweat unless it were very
	under powered. Also it doesn't create multiple connections to the
	same server. It multiplexes IO on the same socket so the
	number of sockets is not an issue. If you ran 5 T2Crawlers on
	separate machines with 100GB connections you might create a
	problem (or index the server very quickly :o)

	Anyway, jCIFS is ideal for this sort of thing.

	Mike




More information about the jcifs mailing list