Search code examples
elasticsearchquerydsl

ElasticSearch 5 won't find documents with keyword including space


I/m indexing documents with the following format:

{
"title": "this is the title",
"brand": "brand here",
"filters": ["filter1", "filter2", "Sin filters", "Camera IP"]
"active": true
}

Then a query looks like:

'query': {
            'function_score': {
                'query': {
                    'bool': {
                        'filter': [
                            {
                                'term': {
                                    'active': True
                                }
                            }
                        ],
                        'must': [
{
                                'terms': {
                                    'filters': ['camera ip']

                                }
                            }
                        ]
                    }
                }
            }
        }

I can't return any document with "Camera IP" filters (or any variation of this string, lowercase and so on), but Es returns the ones with filters: "Sin filters".

The index is created with the following settings. Note that "filter" fields will fall under default template and is of type keyword

"settings":{
         "index":{
            "analysis":{
                "analyzer":{
                    "keylower":{
                        "tokenizer":"keyword",
                        "filter":"lowercase"
                    }
                }
            }
         }
    },
    "mappings": {

        "_default_": {
            "dynamic_templates": [
                {
                    "string_as_keywords": {
                        "mapping": {
                            "index": "not_analyzed",
                            "type" : "keyword",
                            **"analyzer": "keylower"** # I also tried with and without changing this analyzer
                            },
                        "match": "*",
                        "match_mapping_type": "string"
                    }
                },
                {
                    "integers": {
                        "mapping": {
                            "type": "integer"
                        },
                        "match": "*",
                        "match_mapping_type": "long"
                    }
                },
                {
                    "floats": {
                        "mapping": {
                            "type": "float"
                        },
                        "match": "*",
                        "match_mapping_type": "double"
                    }
                }
            ]
        }
}

What I'm missing? It's strange it returns those with "Sin filters" filter but not with "Camera IP".

Thanks.


Solution

  • It seems like you want the filters to be lowercase and not be tokenized. I think the problem with your query is that you set the type of the strings a "keyword" and ES will not analyze these fields, not even changing their case:

    Keyword fields are only searchable by their exact value.

    That is why with your setting you can still retrieve the document with a query like this: {"query": {"term": {"filters": "Camera IP"}}}'.

    Since you want the analyzer to change the casing of your text before indexing you should set the type to text by changing your mapping to something like this:

    {"settings":{
      "index": {
            "analysis":{
                "analyzer":{
                    "test_analyzer":{
                        "tokenizer":"keyword",
                        "filter":"lowercase"
                    }
                }
            }
         }
      },
      "mappings": {
        "_default_": {
            "dynamic_templates": [
                {
                    "string_as_keywords": {
                        "mapping": {
                            "type": "text",
                            "index": "not_analyzed",
                            "analyzer": "test_analyzer"
                            },
                        "match": "*",
                        "match_mapping_type": "string"
                    }
                }
            ]
        }
    }}