Search code examples
elasticsearchelastic-stack

How to search by non-tokenized field length in ElasticSearch


Say I create an index people which will take entries that will have two properties: name and friends

PUT /people
{
  "mappings": {
    "properties": {
      "friends": { 
        "type": "text",
        "fields": {
          "keyword": { 
            "type": "keyword"
          }
        }
      }
    }
  }
}

and I put two entries, each one of them has two friends.

POST /people/_doc
{
  "name": "Jack",
  "friends": [
    "Jill", "John"
  ]
}


POST /people/_doc
{
  "name": "Max",
  "friends": [
    "John", "John"  # Max will have two friends, but both named John
  ]
}

Now I want to search for people that have multiple friends

GET /people/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "script": {
            "script": {
              "source": "doc['friends.keyword'].length > 1"
            }
          }
        }
      ]
    }
  }
}

This will only return Jack and ignore Max. I assume this is because we are actually traversing the inversed index, and John and John create only one token - which is 'john' so the length of the tokens is actually 1 here.

Since my index is relatively small and performance is not the key, I would like to actually traverse the source and not the inversed index

GET /people/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "script": {
            "script": {
              "source": "ctx._source.friends.length > 1"
            }
          }
        }
      ]
    }
  }
}

But according to the https://github.com/elastic/elasticsearch/issues/20068 the source is supported only when updating, not when searching, so I cannot.

One obvious solution to this seems to take the length of the field and store it to the index. Something like friends_count: 2 and then filter based on that. But that requires reindexing and also this appears as something that should be solved in some obvious way I am missing.

Thanks a lot.


Solution

  • There is a new feature in ES 7.11 as runtime fields a runtime field is a field that is evaluated at query time. Runtime fields enable you to:

    1. Add fields to existing documents without reindexing your data
    2. Start working with your data without understanding how it’s structured
    3. Override the value returned from an indexed field at query time
    4. Define fields for a specific use without modifying the underlying schema

    you can find more information here about runtime fields, but how you can use runtime fields you can do something like this:

    Index Time:

    PUT my-index/
    {
      "mappings": {
        "runtime": {
          "friends_count": {
            "type": "keyword",
            "script": {
              "source": "doc['@friends'].size()"
            }
          }
        },
        "properties": {
          "@timestamp": {"type": "date"}
        }
      }
    }
    

    You can also use runtime fields in search time for more information check here.

    Search Time

    GET my-index/_search
    {
      "runtime_mappings": {
        "friends_count": {
          "type": "keyword",
          "script": {
            "source": "ctx._source.friends.size()"
          }
        }
      }
    }
    
    

    Update:

    POST mytest/_update_by_query
    {
        "query": {
            "match_all": {}
        }, 
        "script": {
           "source": "ctx._source.arrayLength = ctx._source.friends.size()"
        }
    }
    

    You can update all of your document with query above and adjust your query.