I am building a search engine and I finished the first phase which is spidering (fetching html documents and parsing each document to get the other links). Now I must index the content of html documents. First of all I decided to use DBMS (like SQL Server) for this purpose but I found another library called Lucene.NET.
What is the difference between lucene.NET and SQL Server and which one is better to use to index html documents? I read alot about Lucene.Net and I surprised that it gives better performance than SQL Server. Can any one explain this to me?
SQL Server is a general purpose RDBMS that is not optimized for very fast text indexing (yes, it has full text indexes, but it does lots of other things at the same time).
Lucene.NET is not a RDBMS and its main function is fast text indexing.
Not that surprising it is better at it than SQL Server.