I have a title I am looking for
The title is, and is stored in a document as "Police diaries : stefan zweig"
When I search "Police" I get the result. But when I search Policeman I do not get the result.
Here is the query:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"fields": [
"title",
omitted because irrelevance...
],
"query": "Policeman",
"fuzziness": "1.5",
"prefix_length": "2"
}
}
],
"must": {
omitted because irrelevance...
}
}
},
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
and here is the mapping
{
"books": {
"mappings": {
"book": {
"_all": {
"analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
"title": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
},
"sort": {
"type": "text",
"analyzer": "to order in another language, (creates a string with symbols)",
"fielddata": true
}
}
}
}
}
}
}
}
It should be noted that I have documents with a title "some title" which get hits if I search for "someone title".
I cant figure out why the police book is not showing up.
So you have 2 parts of your question.
police
when searching for policeman
.some title
documents match the someone title
document and according to that you expect the first one to match as well.Let me first explain you why second query matches and the why the first one doesn't and then would tell you, how to make the first one to work.
Your document containing some title
creates below tokens and you can verify this with analyzer API.
POST /_analyze
{
"text": "some title",
"analyzer" : "standard" --> default analyzer for text field
}
{
"tokens": [
{
"token": "some",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "title",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Now when you search for someone title
using the match query which is analyzed and uses the same analyzer which is used on index time
on field.
So it creates 2 tokens someone
and title
and match query matches the title
tokens, which is the reason it comes in your search result, you can also use Explain API to verify and see the internals how it matches in detail.
police
title when searching for policeman
You need to make use of synonyms token filter as shown in the below example.
{
"settings": {
"analysis": {
"analyzer": {
"synonyms": {
"filter": [
"lowercase",
"synonym_filter"
],
"tokenizer": "standard"
}
},
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms" : ["policeman => police"] --> note this
}
}
}
},
"mappings": {
"properties": {
"": {
"type": "text",
"analyzer": "synonyms"
}
}
}
}
{
"dialog" : "police"
}
policeman
{
"query": {
"match" : {
"dialog" : {
"query" : "policeman"
}
}
}
}
"hits": [
{
"_index": "so_syn",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"dialog": "police" --> note source has `police` only.
}
}
]