philip.carmichael at natinst.com
philip.carmichael at natinst.com
Tue Aug 10 21:59:36 GMT 1999
I hacked smbclient for exactly this reason last semester. The network I was on
had roughly 1200+ computers on it with a LOT of shared material. I used an SQL
database (MySQL) to store the file information - namely the workgroup, computer,
share, path, file, file size etc. The average number of files indexed was
around 300,000 or so, depending on how many people left their computers on at
4am. The crawler would usually take around an hour and a half to index the
The way I originally accomplished it was by hacking the smbclient code to accept
just a command-line workgroup and computer. It would then get a listing of
shares and fork off hacked smbclients for those shares, doing recursive
directory listings on those shares and submitting the files to the database.
That method really isn't favorable for two reasons. One, the codebase is quite
large and hard to make changes to if you need your crawler to do something else.
Two, any bugfixes in smbclient or related items aren't propagated to your
crawler. My recent upgrade of the crawler was to modify the smb2www perl
scripts, since they already handle all of the output parsing of smbclient. I've
integrated that with the DBI modules, which in the end has provided a MUCH more
manageable and adaptable solution. If you'd like, when I finish it all up (will
be finished by the time I go back to school - so September 2nd) I can give you a
link to the changes/source.
One benefit (for the network that I was on) was that smbclient would use WINS,
while smblib wouldn't (am I right in saying this?). With all of the computers
on multiple subnets, this made it all possible. The hardest part (read:
impossible) was getting all 1200 people to set their computers up to use WINS.
pgc at tamu.edu
I am in the process of writing a program to index a SMB network and
provide a web based search engine for the database. I had originally
considered fork()ing and running smbclient to retrieve indexes, but
concluded that this was a bad idea, especially since the network on
which I am developing this software has *thousands* of machines, many of
which have quite a bit of stuff on them. The obvious upshot of this is
that forking off a zillion copies of smbclient would slow things down a
So I went looking for some kind of library interface. The first thing I
found (it serendipitously appeared on freshmeat one day) was libsmb.
Note that this is *not* the libsmb that is in the Samba distribution.
It is a separate implementation of the client side of the SMB protocol
in C++ . Given that the program I'm writing is in C++, it was an
extra plus that this was a C++ interface.
However, I saw in the ChangeLog for Samba 2.0.5a that smbmount, etc. had
been moved over to using libsmb, and went Oh Dear, Houston, We Have A
Problem (or something like that). So I looked at the libsmb that's in
the Samba distribution, and it didn't look too bad, except that I
couldn't find any documentation on how to use it, and looking at a
nearly 3000 line program (client.c) to figure it out was not my idea of
a good use of time.
So my question comes down to this:
What should someone who wants to develop an application that uses the
SMB protocol do when fork()ing smbclient is not an option? I am a big
fan of code reuse, and also of not forcing people to install all sorts
of uncommon libraries and other software in order to use your software.
Since nearly anyone who wanted to use my software would probably already
have Samba, and other parts of it would use the Samba programs anyways
(it would contain an SMB2WWW-like interface), it would be nice to be
able to use the SMB code in Samba.
I look forward to your thoughts,
You can find this libsmb at http://www-eleves.iie.cnam.fr/~brodu/smblib/
More information about the samba-technical