Search code examples
elasticsearchyamlpipelineingest

Accessing metadata fields within ingestpipeline.yml set's processor in Elasticsearch


I have to write an ingest pipeline for elasticsearch within an pipeline.yml file. I was able to retrieve my field with grok and was able to divide it with the split processor. Now, I want to assign each value of the resulting array from the split operation to its own field.

But I'm not able to access the elements of the split array. The relevant code snippets look like this:

 - grok:
  field: message
  patterns:
    - ^TRIGGER OCCURRED. %{GREEDYDATA:pac.log.deo.trigger.path}
  tag: TRIGGER

  - split:
  if: ctx.pac.log.tags != null && ctx.pac.log.tags.contains('TRIGGER')
  field: '@metadata.pac.log.deo.trigger.path'
  separator: "/"

- set:
  if: ctx.pac.log.tags != null && ctx.pac.log.tags.contains('TRIGGER')
  field: pac.log.deo.trigger.provider
  value: '{{{@metadata.pac.log.deo.trigger.path[0]}}}'

a log line would look like:

TRIGGER OCCURRED: Timer/Period [seconds]/10 seconds

I would like to have the first value = index 0, if elasticsearch indexes start as well as other oop - languages array indexes with 0, stored inside the field pac.log.deo.trigger.provider

I tried varies annotations:

'{{{@metadata.pac.log.deo.trigger.path[0]}}}'
'{{@metadata.pac.log.deo.trigger.path[0]}}'
'@metadata.pac.log.deo.trigger.path[0]'
'@metadata.pac.log.deo.trigger.path[0]'
'{{{_source.metadata.pac.log.deo.trigger.path[0]}}}'
'{{{_ingest.metadata.pac.log.deo.trigger.path[0]}}}'

Since its ingesting processors do not filter plugins, the filter "ruby" is not available. List of available ingest Processors:

"processors": [
      {
        "type": "append"
      },
      {
        "type": "attachment"
      },
      {
        "type": "bytes"
      },
      {
        "type": "circle"
      },
      {
        "type": "community_id"
      },
      {
        "type": "convert"
      },
      {
        "type": "csv"
      },
      {
        "type": "date"
      },
      {
        "type": "date_index_name"
      },
      {
        "type": "dissect"
      },
      {
        "type": "dot_expander"
      },
      {
        "type": "drop"
      },
      {
        "type": "enrich"
      },
      {
        "type": "fail"
      },
      {
        "type": "fingerprint"
      },
      {
        "type": "foreach"
      },
      {
        "type": "geoip"
      },
      {
        "type": "grok"
      },
      {
        "type": "gsub"
      },
      {
        "type": "html_strip"
      },
      {
        "type": "inference"
      },
      {
        "type": "join"
      },
      {
        "type": "json"
      },
      {
        "type": "kv"
      },
      {
        "type": "lowercase"
      },
      {
        "type": "network_direction"
      },
      {
        "type": "pipeline"
      },
      {
        "type": "registered_domain"
      },
      {
        "type": "remove"
      },
      {
        "type": "rename"
      },
      {
        "type": "script"
      },
      {
        "type": "set"
      },
      {
        "type": "set_security_user"
      },
      {
        "type": "sort"
      },
      {
        "type": "split"
      },
      {
        "type": "trim"
      },
      {
        "type": "uppercase"
      },
      {
        "type": "uri_parts"
      },
      {
        "type": "urldecode"
      },
      {
        "type": "user_agent"
      }

Solution

  • Found the solution:

    - set: 
          if: ctx.pac.log.tags != null && ctx.pac.log.tags.contains('TRIGGER')
          field: pac.log.deo.trigger.provider
          value: {{{@metadata.pac.log.deo.trigger.path.0}}}