Can you recommend a full-text search engine? (Preferably open source)
I have a database of many (though relatively short) HTML documents. I want users to be able to search this database by entering one or more search words in my C++ desktop application. Hence, I’m looking for a fast full-text search solution to integrate with my app. Ideally, it should:
the
, of
, and
, etc.run
also finds documents containing runner
, running
and ran
.To illustrate, assume the database has just two documents:
Document 1:
This is a test of text search.
Document 2:
Testing is fun.
The following words should be in the index: fun
, search
, test
, testing
, text
. If the user types t
in the search box, I want the application to be able to suggest test
, testing
and text
(Ideally, the application should be able to query the search engine for the 10 most common search words starting with t
). A search for testing
should return both documents.
Other points:
Can you suggest a C or C++ based solution? (I’ve briefly reviewed CLucene and Xapian, but I’m not sure if either will address my needs, especially querying the search word indexes for the suggest feature).
I have use with very success the dtSearch module.
They have a dll, that you can use with your application to index just anything and do more than the one you ask.
Note: Is not free.
I do not see in question that you ask for free one, so I write my favor one. The dtSearch have inspire me and I create an indexer for my language Ellinika for my sites, because did not found what I was looking for my language.
There are some modules just for steeming if you just need to find suggestions for your words, I have get reference from here http://tartarus.org/~martin/PorterStemmer/
For example if you have a database like ms sql that all ready do some basic indexing, and some one search for a word, and you do not find nothing, you can do by your self steeming on this word, and search again...