Search code examples
elasticsearchelasticsearch-6

How to sort by an element inside a deep object in elasticsearch?


I have this mapping:

{
    "foos": {
        "mappings": {
            "foo": {
                "dynamic": "false",
                "properties": {
                    "some_id": {
                        "type": "integer"
                    },
                    "language": {
                        "type": "text"
                    },
                    "locations": {
                        "type": "integer"
                    },
                    "name": {
                        "type": "text",
                        "term_vector": "yes",
                        "analyzer": "name_analyzer"
                    },
                    "popularity": {
                        "type": "integer"
                    },
                    "some_deep_count": {
                        "type": "object"
                    }
                }
            }
        }
    }
}

one example entry is the following:

                 {
                    "name": "Some nice name",
                    "some_id": 1,
                    "id": 4378,
                    "popularity": 525,
                    "some_deep_count": {
                        "0": {
                            "32026": 344,
                            "55625": 458,
                            "29": 1077,
                            "55531": 1081,
                            ...
                        },
                        "1": {
                            "32026": 57,
                            "55625": 60,
                            "29": 88,
                            ...
                        }
                    },
                    "locations": [
                        32026,
                        55625,
                        ...
                    ],
                    "language": [
                        "es",
                        "en"
                    ]
                }

where the some_deep_count field can contain only the "0" and "1" keys, which inside of them can contain a very long list of id => value (dynamic, not configurable in advance)

This works very well when filtering:

"_source": [
        "id",
        "some_deep_count.*.55529"
    ],

but I don't understand how to sort by any deep object. I need a deep sum, like the following expression indicates:

...
{
    "sort": {
        "_script": {
            "type": "number",
            "script": {
                "lang": "painless",
                "source": "def deep0 = 0; def deep1 = 0; if(doc.containsKey('some_deep_count.0.55529')) { deep0 = doc['some_deep_count.0.55529'] } if(doc.containsKey('some_deep_count.1.55529')) { deep1 = doc['some_deep_count.1.55529'] } return deep0 + deep1"
            },
            "order": "desc"
        }
    }
}

which unfortunately always returns 0 in the sorting field, since doc.containsKey('some_deep_count.0.55529') results always being empty. Also doc.containsKey('some_deep_count') does.

Interestingly enough, doc.containsKey('some_id') would work, and I really can't understand why

EDIT

In response to Val's proposal, I'm attaching a full request / response

Request:

{
  "sort": {
  "_script": {
    "type": "number",
    "script": {
      "lang": "painless",
      "source": "def ps0 = 0; if(doc.containsKey('some_deep_count.0.55529')) { ps0 = doc['some_deep_count.0.55529'].value; }  return ps0 "
    },
    "order": "desc"
  }
},
  "_source": [
      "id",
      "some_deep_count.0.55529"
  ],
  "size": 1
}

Response:

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 2121,
        "max_score": null,
        "hits": [
            {
                "_index": "foos",
                "_type": "foo",
                "_id": "5890",
                "_score": null,
                "_source": {
                    "some_deep_count": {
                        "0": {
                            "55529": 228
                        }
                    },
                    "id": 5890
                },
                "sort": [
                    0.0
                ]
            }
        ]
    }
}

The issue is probably to be found in the condition. In fact I get "sort":[0.0] even when the sorting is as simple as "def ps0 = 0; if(doc.containsKey('some_deep_count')) { ps0 = 99999; } return ps0 ", revealing that the clause doc.containsKey('some_deep_count') might have some problem

EDIT2

The index got with curl -XGET localhost:9200/foos looks like the following:

{
  "foos": {
      "aliases": {},
      "mappings": {
          "foo": {
              "dynamic": "false",
              "properties": {
                  "some_id": {
                      "type": "integer"
                  },
                  "language": {
                      "type": "text"
                  },
                  "locations": {
                      "type": "integer"
                  },
                  "name": {
                      "type": "text",
                      "term_vector": "yes",
                      "analyzer": "name_analyzer"
                  },
                  "popularity": {
                      "type": "integer"
                  },
                  "some_deep_count": {
                      "type": "object"
                  }
              }
          }
      },
      "settings": {
          "index": {
              "number_of_shards": "5",
              "provided_name": "foos",
              "creation_date": "1576168104248",
              "analysis": {
                  "analyzer": {
                      "name_analyzer": {
                          "filter": [
                              "lowercase"
                          ],
                          "tokenizer": "keyword"
                      }
                  }
              },
              "number_of_replicas": "0",
              "uuid": "26xckWaOQuuxFrMvIdikvw",
              "version": {
                  "created": "6020199"
              }
          }
      }
  }
}

Thank you


Solution

  • I've been able to return a non-zero sort value by reproducing your case as shown below:

    # 1. create the index mapping
    PUT sorts
    {
      "mappings": {
        "properties": {
          "some_deep_count": {
            "type": "object"
          }
        }
      }
    }
    
    # 2. index a sample document
    PUT sorts/_doc/1
    {
      "some_deep_count": {
        "0": {
          "29": 1077,
          "32026": 344,
          "55531": 1081,
          "55625": 458
        },
        "1": {
          "29": 88,
          "32026": 57,
          "55625": 60
        }
      }
    }
    
    # 3. Sort the results
    POST sorts/_search
    {
      "sort": {
        "_script": {
          "type": "number",
          "script": {
            "lang": "painless",
            "source": """
              def deep0 = 0; 
              def deep1 = 0; 
              if(doc.containsKey('some_deep_count.0.55531')) { 
                deep0 = doc['some_deep_count.0.55531'].value;
              } 
              if(doc.containsKey('some_deep_count.1.55531')) { 
                deep1 = doc['some_deep_count.1.55531'].value;
              } 
              return deep0 + deep1;
            """
          },
          "order": "desc"
        }
      }
    }
    

    Results => sort = 1081

    "hits" : [
      {
        "_index" : "sorts",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : null,
        "_source" : {
          "some_deep_count" : {
            "0" : {
              "29" : 1077,
              "32026" : 344,
              "55531" : 1081,
              "55625" : 458
            },
            "1" : {
              "29" : 88,
              "32026" : 57,
              "55625" : 60
            }
          }
        },
        "sort" : [
          1081.0
        ]
      }
    ]
    

    As you can see I'm using 55531 which exists in some_deep_count.0 but not in some_deep_count.1 and the result is 1081, which is correct.

    Using *.29 which is available in both 0 and 1 would yield 1165 which is also correct (1077 + 88).

    The only difference between my script and yours is that when assigning deep0 and deep1 you need to add .value to the doc field reference.

    UPDATE

    The problem lies in the fact that you're specifying dynamic: false in your mapping. Using that parameter means that if you index new fields that weren't present in your mapping at index creation time, your mapping won't be updated. So as it stands, all the sub-fields that you index in some_deep_count are never indexed, which is why you're always getting 0. Remove dynamic: false and everything will work as expected.