Search code examples
elasticsearchkibanaelastic-stackelasticsearch-queryelasticsearch-painless

How to move fields from nested object out from the nested object into separate objects in an elastic index


I have a nested field in my index that contains several objects.

 "customFields" : [
            {
              "objectTypeId" : 17,
              "Value" : "",
              "description" : "The original author of the document",
              "Name" : "Document Author"
            },
            {
              "objectTypeId" : 17,
              "Value" : "",
              "description" : "Source document number",
              "Name" : "Legacy document number"
            },
.
.
.
]

I want to create a script that can move the fields out from the customFields object into seperate objects like this:

"Document_Author": {
"Description": "The original author of the document",
"Value": "Some value"
"ObjectTypeId": 17
},

"Legacy document number": {
"Description": "Source document number",
"Value": "Some value"
"ObjectTypeId": 17
},
.
.
.

I tried a script like this, i am very new to elastic search and scripting, so this does not work.

POST /new_document-20/_update_by_query
 {
  "script" : { "inline": "for (int i = 0; i < ctx._source.customFields.length; ++i) { ctx._source.add(\"customFields[i].Name\" : { \"Value\" : \"customFields[i].Value\", \"Description\" : \"customFields[i].description\", \"objectTypeId\" : \"customFields[i].objectTypeId\"}) }",
 
       "query": {
         "bool": {
           "must": [
             {
               "exists": {
                 "field": "customFields.Name"
          }
        }
      ]
    }
  }
  }
}

I get compilation errors from this pointing to customFields[i].Name Like this:

"error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "compile error",
        "script_stack": [
          "... d(\"customFields[i].Name\" : { \"Value\" : \"customFiel ...",
          "                             ^---- HERE"

How can I create a script that helps me move the fields out from the nested object?


Solution

  • You can perform only one ctx._source write operation per loop to prevent the "The maximum number of statements that can be executed in a loop has been reached." error.

    With that being said, I'd suggest to:

    1. copy the original _source
    2. extract the customFields list
    3. iterate the extracted list and adjust the hash maps to conform with the desired format
    4. set the newly formed hash map onto the copied source
    5. replace the original _source fully

    In practical terms:

    POST /new_document-20/_update_by_query
    {
      "script": {
        "inline": """
          def source_copy = ctx._source;
          def customFields = source_copy.remove('customFields');
          
          for (int i = 0; i < customFields.length; i++) {
            // store the current iteratee
            def current = customFields[i];
            
            // remove AND return the name
            def name = current.remove('Name');
            
            // set in the _source
            source_copy[name] = current;
          }
          
          // replace the original source completely
          ctx._source = source_copy;
        """,
        "query": {
          "bool": {
            "must": [
              {
                "exists": {
                  "field": "customFields.Name"
                }
              }
            ]
          }
        }
      }
    }
    

    And as an inline script string:

    "\n      def source_copy = ctx._source;\n      def customFields = source_copy.remove('customFields');\n      \n      for (int i = 0; i < customFields.length; i++) {\n        // store the current iteratee\n        def current = customFields[i];\n        \n        // remove AND return the name\n        def name = current.remove('Name');\n        \n        // set in the _source\n        source_copy[name] = current;\n      }\n      \n      // replace the original source completely\n      ctx._source = source_copy;\n    "
    

    By the way, hash maps in Painless are instantiated either through a new HashMap call or through the (slightly confusing) [:] operator, i.e.:

    def entries_map_without_name = [
       "Value" : current.Value, 
       "Description" : current.description,
       "objectTypeId" : current.objectTypeId
    ];
    

    P.S. The conversion from a nested list of objects to a bunch of hash maps that you were trying to perform has its advantages and disadvantages, esp. when it comes to the mapping size bloat and the quite limited aggregation possibilities.

    Shameless plug -- I discuss just that my Elasticsearch Handbook, specifically in this sub-chapter.