Search code examples
elasticsearchelasticsearch-pluginelasticsearch-dslelasticsearch-mapping

Elasticsearch index comma-separated values and use them as filters


I am currently indexing the element field as "element" : "dog,cat,mouse" with the following configuration:

ES config:

"settings": {
    "analysis": {
        "analyzer": {
            "search_synonyms": {
                "tokenizer": "whitespace",
                "filter": [
                    "graph_synonyms",
                    "lowercase",
                    "asciifolding"
                ],
            },
            "comma" : {
                "type" : "custom",
                "tokenizer" : "comma"
            }
        },
        "filter": {
            "graph_synonyms": {
                ...
            }
        },
        "normalizer": {
            "normalizer_1": {
                ...
            }
        },
        "tokenizer" : {
            "comma" : {
                "type" : "pattern",
                "pattern" : ","
            }
        },
    }
},

Fields mapping:

"mappings": {
    "properties": {
        "element": {
            "type": "keyword",
            "normalizer": "normalizer_1"
        }
        ................
    }
}

The value dog,cat,mouse is used afterwards as a filter but I want to separate them and use each value as a separated filter. I tried to use multi-fields feature and made the following changes but I'm still not sure what else should I do.

"element": {
    "type": "keyword",
    "normalizer": "normalizer_1",
    "fields": {
        "separated": {
            "type": "text",
            "analyzer": "comma"
        }
    }
},

Solution

  • If I understand correctly, you have a field where you are storing the value as dog,cat,mouse and you need them separately like dog, cat and mouse for that you can simply use the text field to store them which uses default standard analyzer, which split tokens on comma ,.

    analyze API to show the tokens

    {
        "text": "dog,cat,mouse",
        "analyzer": "standard"
    }
    

    tokens generated

    {
        "tokens": [
            {
                "token": "dog",
                "start_offset": 0,
                "end_offset": 3,
                "type": "<ALPHANUM>",
                "position": 0
            },
            {
                "token": "cat",
                "start_offset": 4,
                "end_offset": 7,
                "type": "<ALPHANUM>",
                "position": 1
            },
            {
                "token": "mouse",
                "start_offset": 8,
                "end_offset": 13,
                "type": "<ALPHANUM>",
                "position": 2
            }
        ]
    }
    

    As per comment, Adding a sample on how to define element field so that standard analyzer is used, note currently its defined as keyword with normalizer, hence standard analyzer is not used.

    Index mapping

    PUT /your-index/

    {
      "mappings": {
        "properties": {
          "name": { 
            "element": "text"
            }
          }
        }
      }