[jcifs] indexing a file system using jcifs
Allen, Michael B (RSCH)
Michael_B_Allen at ml.com
Fri Aug 9 09:17:32 EST 2002
> -----Original Message-----
> From: Tait Larson [SMTP:larsonte at yahoo.com]
> Sent: Thursday, August 08, 2002 2:11 PM
> To: jcifs at lists.samba.org
> Subject: [jcifs] indexing a file system using jcifs
>
> The company I work for has a very large very
> unorganized windows based file server where it keeps
> most of it's important documents.
>
What floor are *you* on? :)
> I've been
> considering writing a indexer/classifer to provider
> easier access to the documents. I'm was hoping to use
> jcifs to feed all the docs in the shared file system
> to the indexer.
>
> First off, I don't want to reinvent the wheel. Does
> anyone know if this type of solution already exists?
>
Does the examples/SmbCrawler.java example do what you want?.
You might get a little more throughput from ThreadedSmbCrawler.
The T2Crawler is the fastest but it decrements the depth
argument too agressively and quits early (you can specify
something like 1000 and get it to go over the entire filesystem
though). Of course thes programs are just examples that spit out
pathnames. They don't actually create any kind of "index".
> I need to be very careful not to crash the file system
> or engage too many of the file system resources at one
> time (think accidental DOS). As long as I don't
> create too many symmultaneous connections to the file
> system this shouldn't be a problem, correct? Does
> anyone have any other comments on my concerns?
>
If you run your indexer from one host there is no danger of
"crashing" the server. JCIFS is faster than the C clients in some
repects but it uses a lot of CPU when you do stuff like this. You
probably wouldn't even make the server sweat unless it were very
under powered. Also it doesn't create multiple connections to the
same server. It multiplexes IO on the same socket so the
number of sockets is not an issue. If you ran 5 T2Crawlers on
separate machines with 100GB connections you might create a
problem (or index the server very quickly :o)
Anyway, jCIFS is ideal for this sort of thing.
Mike
More information about the jcifs
mailing list