My application is a help (user assistance system) just like Online MSDN. but the only way to navigation is through SEARCH. Either the search is good or my system is dead.
I am looking for a third party search engine that can connect to database and provide out of the box full text searching. i have researched sql server 2008 ifts, lucene.net api, sql lite fts4 but all of them lack the ranking of result as good as google does.
em not expecting sth like google but i need best ranking search engine product.
Any suggestion or experience ?
maybe i should not go for third party search engine and use Lucene.NET or sql server 2008 FTS but how can i establish good ranking for user provided Search query.. like
"how can i do upload excel file in XYZ interface" etc..
My short answer is discouraging: you won't be able to find do it yourself, even for an "okay" solution.
If you want good ranking:
As you said, a search engine has to do two things at least. The first one is indexing, i.e., finding the documents out of the database based on queried keywords. The second is ranking, which sorts all documents and highlights the most relevant ones.
Ranking is one of the key factor of how good a search engine is. It's not surprising ranking is hard.
To give you an idea how hard it is, take the sentence in your question (i.e., "how can i do upload excel file in XYZ interface") for example. A search engine has to answer at least two questions to get good results:
Which keywords is most important? For example, XYZ might be more important than the word "how", and "can".
What's the possible meanings of the word? "Excel" can be microsoft excel, or Xcel energy(a company name excel)
There are a whole field in computer science dedicated to this problem. If you want some more evidences, take a quick look at ACM WWW.
One thing that is even more discouraging is that getting an "okay" solution would be difficult. The high level point is that the computer knows nothing about English, he has to read a lot to learn how to rank document.
Sadly, "a lot" means a lot of work -- For example, many textbooks suggest ranking documents based on TF/IDF, but getting a reasonable cut for these values requires crawling millions of web pages.
To summarize: