We are replacing the search and indexing module in an application from DtSearch to Solr using solrnet as the .net Solr client library.
We are relatively new to Solr/Lucene and would need some help/direction to understand the more advanced search options in Solr.
The current application supports the following search options using DtSearch:
1)Word(s) or phrase
2)Exact words or phrases
3)Not these words or phrases
4)One or more of words("A" OR "B" OR "C")
5)Proximity of word with n words of another word
6)Numeric range - From - To
7)Option
. Stemming(search* finds searching or searches)
. Synonym(search& finds seek or look)
. Fuzzy within n letters(p%arts finds paris)
. Phonic homonyms(#Smith also finds Smithe and Smythe)
As an example the search query that gets generated to be posted to DtSearch for the below use case:
Search Phrase: generic collection
Exact Phrase: linq
Not these words: sql
One or more of these words: ICollection or ArrayList or Hashtable
Proximity: csharp within 4 words of language
Options:
a. Stemming
b. Synonym
c. Fuzzy within 2 letters
d. Phonic homonyms
Search Query: generic* collection* generic& collection& #generic #collection g%%eneric c%%ollection "linq" -sql ICollection OR ArrayList OR Hashtable csharp w/4 language
We have been able to do simple searches(singular term search in a file content) with highlights with Solr. Now we need to replace these options with Solr/Lucene.
Can anybody provide some directions on what/where should we be looking.
Word(s) or phrase
Solr provides support to query over fields and across fields with variable boost to control relevancy.
Solr also provides wide variation of queries like Phrase Query, Wildcard, Prefix for matching
Exact words or phrases
You can customize Solr to handle Phrase matches and exact word matches.
Not these words or phrases
Negative queries - Solr provides support for boolean operators which include negative queries using either -
or Not
One or more of words("A" OR "B" OR "C")
Boolean Operators - Solr provides support for boolean operators which include AND (+)
OR
syntax
Proximity of word with n words of another word
Promixity Search - Solr supports proximity queries by the ~ operator followed by the slop (proximity difference)
Numeric range - From - To Range Queries - Solr supports Range queries for both Numbers and Date.
Option
Stemming(search* finds searching or searches)
Stemmer - Solr has inbuilt stemmers which can be included directly out of the box. It also allows the ability to define new stemmer
Detail Language Analysis support for various languages
Synonym(search& finds seek or look)
Synonym - Solr supports synonym handling through a file based approach.
Fuzzy within n letters(p%arts finds paris)
Fuzzy search - Solr supports fuzzy based searches with the ~ operator
Phonic homonyms(#Smith also finds Smithe and Smythe)
Phonetic search - Solr provides phonetic searches allowing the match for misspell words. It has out of box support for 4 filters which can be customized.
Complete list of AnalyzersTokenizersTokenFilters