Search code examples
elasticsearchhashmapelasticsearch-painless

How to create a HashMap with custom object as a key?


In Elasticsearch, I have an object that contains an array of objects. Each object in the array have type, id, updateTime, value fields.

My input parameter is an array that contains objects of the same type but different values and update times. Id like to update the objects with new value when they exist and create new ones when they aren't.

I'd like to use Painless script to update those but keep them distinct, as some of them may overlap. Issue is that I need to use both type and id to keep them unique. So far I've done it with bruteforce approach, nested for loop and comparing elements of both arrays, but I'm not too happy about that.

One of the ideas is to take array from source, build temporary HashMap for fast lookup, process input and later store all objects back into source.

Can I create HashMap with custom object (a class with type and id) as a key? If so, how to do it? I can't add class definition to the script.

Here's the mapping. All fields are 'disabled' as I use them only as intermidiate state and query using other fields.

{
  "properties": {
    "arrayOfObjects": {
      "properties": {
        "typ": {
          "enabled": false
        },
        "id": {
          "enabled": false
        },
        "value": {
          "enabled": false
        },
        "updated": {
          "enabled": false
        }
      }
    }
  }
}

Example doc.

{
  "arrayOfObjects": [
    {
      "typ": "a",
      "id": "1",
      "updated": "2020-01-02T10:10:10Z",
      "value": "yes"
    },
    {
      "typ": "a",
      "id": "2",
      "updated": "2020-01-02T11:11:11Z",
      "value": "no"
    },
    {
      "typ": "b",
      "id": "1",
      "updated": "2020-01-02T11:11:11Z"
    }
  ]
}

And finally part of the script in it's current form. The script does some other things, too, so I've stripped them out for brevity.

if (ctx._source.arrayOfObjects == null) {
    ctx._source.arrayOfObjects = new ArrayList();
}
for (obj in params.inputObjects) {
    def found = false;
    for (existingObj in ctx._source.arrayOfObjects) {
        if (obj.typ == existingObj.typ && obj.id == existingObj.id && isAfter(obj.updated, existingObj.updated)) {
            existingObj.updated = obj.updated;
            existingObj.value = obj.value;
            found = true;
            break;
        }
    }
    if (!found) {
        ctx._source.arrayOfObjects.add([
            "typ": obj.typ,
            "id": obj.id,
            "value": params.inputValue,
            "updated": obj.updated
        ]);
    }
}

Solution

  • There's technically nothing suboptimal about your approach.

    A HashMap could potentially save some time but since you're scripting, you're already bound to its innate inefficiencies... Btw here's how you initialize & work with HashMaps.

    Another approach would be to rethink your data structure -- instead of arrays of objects use keyed objects or similar. Arrays of objects aren't great for frequent updates.

    Finally a tip: you said that these fields are only used to store some intermediate state. If that weren't the case (or won't be in the future), I'd recommend using nested arrays to enable querying independently of other objects in the array.