Search code examples
elasticsearchelasticsearch-queryelasticsearch-7

Is there a character limit on an individual word within a match phrase query in elastic search?


Fairly new to Elastic Search so may have to bare with me, I'm running into a problem where if I search for a document using 20 characters or less, the document appears, however any more characters within the same word within the query, I get no results:

  • Using 'phenoxymethylpenicillin' brings no documents.
  • Using 'phenoxymethylpenicil' brings back documents.

This is the query I'm trying to use:

{
    "match_phrase": {
        "genericNames.name": {
        "query": "phenoxymethylpenicillin",
        "slop": 15,
        "zero_terms_query": "NONE",
        "boost": 1.0
        }
    }
}

Here is the full query: https://pastebin.com/DEJvP2uS

Like I said, I'm fairly new to this, it may be a point of not looking in the correct area.

So my question is, what possible areas would cause this and why?

Thanks!

Edit: Provided is an extract from one of the documents from the sample data. I can't show a lot of it due a lot of it being sensitive, luckily the names from sample data I can share. This is from the data I'm trying to search for:

"genericNames":[
{
    "nameType":1,
    "name":"Phenoxymethylpenicillin 250mg tablets",
    "nameChangeCode":"0000",
    "nameBasisCode":"0001",
    "nameTypeDescription":"Name",
    "startDate":"1948-01-01T00:00:00.000000+0000",
    "endDate":"3456-02-01T00:00:00.000000+0000"
},
{
    "nameType":5,
    "name":"Penicillin V 250mg tablets",
    "nameTypeDescription":"Alternative Name 3",
    "startDate":"1948-01-01T00:00:00.000000+0000",
    "endDate":"3456-02-01T00:00:00.000000+0000"
}
],

I have also provided the index mapping as it may provide extra information:

{
    "amp": {
        "mappings": {
            "properties": {
                "_class": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "ampId": {
                    "type": "long"
                },
                "amppId": {
                    "type": "long"
                },
                "attributes": {
                    "type": "nested",
                    "properties": {
                        "attributeQualifier": {
                            "type": "keyword"
                        },
                        "attributeType": {
                            "type": "integer"
                        },
                        "attributeTypeDescription": {
                            "type": "keyword"
                        },
                        "attributeValue": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "countryId": {
                            "type": "long"
                        },
                        "decodedValue": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "dictionaries": {
                    "type": "nested",
                    "properties": {
                        "abbreviation": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "description": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "dictId": {
                            "type": "integer"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "endDate": {
                    "type": "date",
                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                },
                "excipients": {
                    "type": "nested",
                    "properties": {
                        "basisOfStrengthCode": {
                            "type": "keyword"
                        },
                        "bossId": {
                            "type": "long"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "ingredientNames": {
                            "properties": {
                                "endDate": {
                                    "type": "date"
                                },
                                "name": {
                                    "type": "text",
                                    "fields": {
                                        "keyword": {
                                            "type": "keyword",
                                            "ignore_above": 256
                                        }
                                    }
                                },
                                "startDate": {
                                    "type": "date"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "strengthDenominatorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthDenominatorValue": {
                            "type": "keyword"
                        },
                        "strengthNumeratorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthNumeratorValue": {
                            "type": "keyword"
                        },
                        "strengthVal": {
                            "type": "keyword"
                        },
                        "unitOfMeasure": {
                            "type": "keyword"
                        }
                    }
                },
                "extractableEntry": {
                    "type": "boolean"
                },
                "genericNames": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "name": {
                            "type": "text",
                            "ignore_above": 256,
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            },
                            "analyzer": "autocomplete_index",
                            "search_analyzer": "autocomplete_search"
                        },
                        "nameBasisCode": {
                            "type": "keyword"
                        },
                        "nameChangeCode": {
                            "type": "keyword"
                        },
                        "nameType": {
                            "type": "integer"
                        },
                        "nameTypeDescription": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "id": {
                    "type": "keyword"
                },
                "ingredients": {
                    "type": "nested",
                    "properties": {
                        "basisOfStrengthCode": {
                            "type": "keyword"
                        },
                        "bossId": {
                            "type": "long"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "ingredientNames": {
                            "properties": {
                                "endDate": {
                                    "type": "date"
                                },
                                "name": {
                                    "type": "text",
                                    "fields": {
                                        "keyword": {
                                            "type": "keyword",
                                            "ignore_above": 256
                                        }
                                    }
                                },
                                "startDate": {
                                    "type": "date"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "strengthDenominatorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthDenominatorValue": {
                            "type": "keyword"
                        },
                        "strengthNumeratorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthNumeratorValue": {
                            "type": "keyword"
                        },
                        "strengthVal": {
                            "type": "keyword"
                        },
                        "unitOfMeasure": {
                            "type": "keyword"
                        }
                    }
                },
                "invalidEntry": {
                    "type": "boolean"
                },
                "pitId": {
                    "type": "integer"
                },
                "ppaCodes": {
                    "type": "nested",
                    "properties": {
                        "code": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "proprietaryNames": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "name": {
                            "type": "text",
                            "ignore_above": 256,
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            },
                            "analyzer": "autocomplete_index",
                            "search_analyzer": "autocomplete_search"
                        },
                        "nameBasisCode": {
                            "type": "keyword"
                        },
                        "nameChangeCode": {
                            "type": "keyword"
                        },
                        "nameType": {
                            "type": "integer"
                        },
                        "nameTypeDescription": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "qpuUomCde": {
                    "type": "keyword"
                },
                "qpuVal": {
                    "type": "keyword"
                },
                "qtyUomCde": {
                    "type": "keyword"
                },
                "qtyVal": {
                    "type": "keyword"
                },
                "snomedCodes": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "ppaNextNo": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "snomed": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "snomedDescriptions": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "ppaNextNo": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "snomed": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "startDate": {
                    "type": "date",
                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                },
                "suppliers": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "names": {
                            "type": "nested",
                            "properties": {
                                "endDate": {
                                    "type": "date",
                                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                                },
                                "name": {
                                    "type": "text",
                                    "fields": {
                                        "raw": {
                                            "type": "keyword"
                                        }
                                    },
                                    "analyzer": "autocomplete_index",
                                    "search_analyzer": "autocomplete_search"
                                },
                                "nameBasisCode": {
                                    "type": "keyword"
                                },
                                "nameChangeCode": {
                                    "type": "keyword"
                                },
                                "nameType": {
                                    "type": "integer"
                                },
                                "nameTypeDescription": {
                                    "type": "text",
                                    "fields": {
                                        "raw": {
                                            "type": "keyword"
                                        }
                                    }
                                },
                                "startDate": {
                                    "type": "date",
                                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "udfs": {
                    "type": "nested",
                    "properties": {
                        "ddIndicator": {
                            "type": "integer"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "udfsUomCode": {
                            "type": "keyword"
                        },
                        "udfsValue": {
                            "type": "keyword"
                        },
                        "vmpUomCode": {
                            "type": "keyword"
                        }
                    }
                },
                "vmpId": {
                    "type": "long"
                },
                "vmppId": {
                    "type": "long"
                },
                "vtms": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                }
            }
        }
    }
}

Edit: Added link to full query - https://pastebin.com/DEJvP2uS

Edit: Settings for index:

{
    "index": {
        "max_ngram_diff": "20",
        "analysis": {
            "filter": {
                "autocomplete_suffix_filter": {
                    "type": "ngram",
                    "min_gram": "1",
                    "max_gram": "20"
                },
                "autocomplete_filter": {
                    "type": "edge_ngram",
                    "min_gram": "1",
                    "max_gram": "20"
                }
            },
            "analyzer": {
                "autocomplete_index": {
                    "filter": [
                        "lowercase",
                        "autocomplete_filter",
                        "autocomplete_suffix_filter"
                    ],
                    "type": "custom",
                    "tokenizer": "standard"
                },
                "autocomplete_search": {
                    "filter": [
                        "lowercase"
                    ],
                    "type": "custom",
                    "tokenizer": "standard"
                }
            }
        },
        "number_of_replicas": "1"
    }
}

Solution

  • This must be happening due to the custom analyzer which you have on your genericNames.name field, you have different custom analyzer, index time you are using the autocomplete_index and search time autocomplete_search analyzer, but the definition of these analyzers is not provided in the question, only mapping part is provided.

    Please provide the output of _setting API on your index, refer https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-settings.html for more info.

    You need to check the tokens generated for phenoxymethylpenicillin using the analyze API for both autocomplete_index and autocomplete_search analyzer and you will notice the difference.