[clug] Looking for string indexing library
Michael Cohen
michael.cohen at netspeed.com.au
Fri May 28 09:11:09 GMT 2004
Hi Jepri,
I have recently implemented a trie algorithm in a forensic tool I have been
writing called pyflag: http://pyflag.sourceforge.net/
There is a directory called indextools to implement the library and even a
python interface. The idea is to first build a dictionary of words - this
builds the trie, and then stores offset numbers at the end of each trie as a
linked list - so you can store as many numbers as you need for each word. You
can then save the index file and load it at a later time for searching.
May do what you want...
Michael.
On Mon, 24 May 2004 09:17 pm, Jepri wrote:
> I've got a list of city names (approx 3 million) and I need to write
> some C code to search through them all quickly.
>
> I'm accessing them by by packing them all into a file (with null
> termination), and then mmaping the file.
>
> So the string indexing library would have to accept pointers to the
> strings, not copy them into some internal store, be able to do substring
> lookups, and ideally be able to return a key or index value for the
> string it finds. Even better would be if it could save its index and
> hot-start from that.
>
> Unicode would be nice.
>
> And while I'm wishing, I'd like a pony as well.
>
> Has anyone seen something like this?
More information about the linux
mailing list