I'm building a search feature that allows users to filter by certain field. One of the field is DOI. Filtering by DOI should only return results with exact match.
I'm trying to index the following object:
public class PaperElasticSearchModel
{
public string Title { get; set; }
public string Abstract { get; set; }
public string Keywords { get; set; }
public string DOI { get; set; }
public int Year { get; set; }
public string[] Author { get; set; }
}
The field DOI
is will have a similar format to this: 10.1523/JNEUROSCI.18-04-01622.1998
.
Below is the setting for the index
Client.Indices.Create(index, pub => pub
.Settings(s => s
.Analysis(descriptor => descriptor
.Analyzers(analyzer => analyzer
.Custom("custom_analyzer", ca => ca
.Filters("lowercase", "stop", "classic", "word_delimiter")
)
)
.TokenFilters(bases => bases
.EdgeNGram("custom_analyzer", td => td
.MinGram(2)
.MaxGram(25))
)
)
.Setting(UpdatableIndexSettings.MaxNGramDiff, 23)
)
.Map<PubScreenSearch>(m => m
.AutoMap()
.Properties(p => p
.Text(t => t
.Name(f => f.Author)
.Analyzer("custom_analyzer"))
.Text(t => t
.Name(f => f.Keywords)
.Analyzer("custom_analyzer"))
.Text(t => t
.Name(f => f.Title)
.Analyzer("custom_analyzer"))
.Keyword(t => t
.Name(f => f.DOI)
)
)));
What Analyzer should I use such that if I were to search by DOI it would return for exact match?
I tried using the Pattern
analyzer but it did not work:
.Text(t => t
.Name(f => f.DOI)
.Analyzer(new PatternAnalyzer().Pattern = @"(.(\\/|\\?)*)\\w+")
Solution:
filterQuery.Add(fq => fq
.Bool(f => f
.Must(boolSHould => boolSHould
.Term(t => t
.Field(feild => feild.DOI.Suffix("keyword"))
.Value(value))
))
);
Let me clarify elasticsearch analyze process and how it works for term and match query
during search and keyword and text
field types during indexing.
Official explanations: Match query : Returns documents that match a provided text, number, date or boolean value. The provided text is analyzed before matching. Term query: Returns documents that contain an exact term in a provided field.
text
field type. If you want to see the results for JNEUROSCI
query in the 10.1523/JNEUROSCI.18-04-01622.1998
data use match + text field type
term query + keyword field type
.Note: As name declared exact match looks for exact match so it's case sensitive.
Here are some examples:
#index the test data
PUT test-analyzer/_doc/1
{
"field_name": "10.1523/JNEUROSCI.18-04-01622.1998"
}
#term query with text field type - no hits
GET test-analyzer/_search
{
"query": {
"term": {
"field_name": "JNEUROSCI"
}
}
}
#term query with keyword field type - no hits
GET test-analyzer/_search
{
"query": {
"term": {
"field_name.keyword": "JNEUROSCI"
}
}
}
#match query with text field type - there is a hit
GET test-analyzer/_search
{
"query": {
"match": {
"field_name": "JNEUROSCI"
}
}
}
#match query with keyword field type - no hits
GET test-analyzer/_search
{
"query": {
"match": {
"field_name.keyword": "JNEUROSCI"
}
}
}
#exact match that you're looking for - term query with keyword field type - there is a hit
GET test-analyzer/_search
{
"query": {
"term": {
"field_name.keyword": "10.1523/JNEUROSCI.18-04-01622.1998"
}
}
}