Search code examples
python-3.xelasticsearchelasticsearch-py

Elasticsearch Search/filter by occurrence or order in an array


I am having a data field in my index in which,

I want only doc 2 as result i.e logically where b comes before a in the array field data.

doc 1:

data = ['a','b','t','k','p']

doc 2:

data = ['p','b','i','o','a']

Currently, I am trying terms must on [a,b] then checking the order in another code snippet. Please suggest any better way around.


Solution

  • My understanding is that the only way to do that would be to make use of Span Queries, however it won't be applicable on an array of values.

    You would need to concatenate the values into a single text field with whitespace as delimiter, reingest the documents and make use of Span Near query on that field:

    Please find the below mapping, sample document, the query and response:

    Mapping:

    PUT my_test_index
    {
      "mappings": {
        "properties": {
          "data":{
            "type": "text"
          }
        }
      }
    }
    

    Sample Documents:

    POST my_test_index/_doc/1
    {
      "data": "a b"
    }
    
    POST my_test_index/_doc/2
    {
      "data": "b a"
    }
    

    Span Query:

    POST my_test_index/_search
    {
        "query": {
            "span_near" : {
                "clauses" : [
                    { "span_term" : { "data" : "a" } },
                    { "span_term" : { "data" : "b" } }
                ],
                "slop" : 0,                  <--- This means only `a b` would return but `a c b` won't. 
                "in_order" : true            <--- This means a should come first and the b
            }
        }
    }
    

    Note that slop controls the maximum number of intervening unmatched positions permitted.

    Response:

    {
      "took" : 0,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 1,
          "relation" : "eq"
        },
        "max_score" : 0.36464313,
        "hits" : [
          {
            "_index" : "my_test_index",
            "_type" : "_doc",
            "_id" : "1",
            "_score" : 0.36464313,
            "_source" : {
              "data" : "a b"
            }
          }
        ]
      }
    }
    

    Let me know if this helps!