Search code examples
elasticsearchelastic-stackelasticsearch-painlessdata-ingestion

Ingest processor foreach or script to replace all items in array


I am trying to run an ingest pipeline to replace instances of "on" and off" to true and false in an array.

This works perfectly with normal strings eg with data like this

[{onoffboolean: "on"}]

I am able to process this with the following:

processors: [
  {
    set: {
      field: 'onoffboolean',
      description: 'String represented trues to native true',
      if: "ctx?.onoffboolean == 'on'",
      value: true
    }
  },
  {
    set: {
      field: 'onoffboolean',
      description: 'String represented falses to native true',
      if: "ctx?.onoffboolean == 'off'",
      value: false
    }
  },
],

However when its an array of values eg:

["on", "on", "off"] to process into [true, true, false]

I am unable to get the right processor to handle this. I have attempted to use foreach but it seems the "_ingest._value" is not available when using an "if" conditional.

This elastic forum thread suggests using a painless script instead

https://discuss.elastic.co/t/foreach-ingest-processor-conditional-append-processor/216884/2

However I don't have enough of an understanding of painless scripting to work this out.


Solution

  • If you have a concrete array field (let's call it list_of_attributes), you can use the following script processor:

    PUT _ingest/pipeline/bool_converter
    {
      "description": "Trims and lowercases all string values",
      "processors": [
        {
          "script": {
            "source": """
              ctx.list_of_attributes = ctx.list_of_attributes.stream()
                                                             .map(str -> str == 'on' ? true : false)
                                                             .collect(Collectors.toList()) 
            """
          }
        }
      ]
    }
    

    and then apply it when you ingest your docs:

    POST your-index/_doc?pipeline=bool_converter
    {
      "list_of_attributes": ["on", "on", "off"]
    }
    

    If you have more than one such array field, you can iterate on the document's fields by adapting my answer to the question Run Elasticsearch processor on all the fields of a document.

    Shameless plug: I dedicated a whole chapter to ingesting & pipelines in my recently released Elasticsearch Handbook. If you're new to ES, give it a shot!