[clug] Looking for string indexing library

Michael Cohen michael.cohen at netspeed.com.au
Fri May 28 09:11:09 GMT 2004


Hi Jepri,
  I have recently implemented a trie algorithm in a forensic tool I have been 
writing called pyflag: http://pyflag.sourceforge.net/
There is a directory called indextools to implement the library and even a 
python interface. The idea is to first build a dictionary of words - this 
builds the trie, and then stores offset numbers at the end of each trie as a 
linked list - so you can store as many numbers as you need for each word. You 
can then save the index file and load it at a later time for searching.

May do what you want...

Michael.

On Mon, 24 May 2004 09:17 pm, Jepri wrote:
> I've got a list of city names (approx 3 million) and I need to write
> some C code to search through them all quickly.
>
> I'm accessing them by by packing them all into a file (with null
> termination), and then mmaping the file.
>
> So the string indexing library would have to accept pointers to the
> strings, not copy them into some internal store, be able to do substring
> lookups, and ideally be able to return a key or index value for the
> string it finds.  Even better would be if it could save its index and
> hot-start from that.
>
> Unicode would be nice.
>
> And while I'm wishing, I'd like a pony as well.
>
> Has anyone seen something like this?



More information about the linux mailing list