Search code examples
spring-bootelasticsearch

Overwrite fields from JSON logs in Elastic ingestion pipeline


I'm running several Spring-Boot applications on Openshift for which I want to send the logs to Elastic. We've set-up the Elastic Agents on the Openshift cluster and the logs are processed through a managed ingestion pipeline. We've set-up the Logback logger to output the logs in ECS format using this guide from Elastic: https://www.elastic.co/guide/en/ecs-logging/java/1.x/setup.html

We've set-up the agent to process these logs into a json object in the incoming message. What I want to achieve is to overwrite certain 'normal' fields with their value in the JSON object. For this we have the following ingestion pipeline:

[
  {
    "set": {
      "field": "log.level",
      "ignore_empty_value": true,
      "copy_from": "json.log.level"
    }
  },
  {
    "set": {
      "field": "message",
      "copy_from": "json.message",
      "ignore_empty_value": true
    }
  },
  {
    "set": {
      "field": "service.name",
      "ignore_empty_value": true,
      "copy_from": "json.service.name"
    }
  },
  {
    "set": {
      "field": "service.version",
      "copy_from": "kubernetes.labels.app_kubernetes_io/version",
      "ignore_empty_value": true
    }
  },
  {
    "pipeline": {
      "name": "logs-kubernetes.container_logs@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "reroute": {
      "tag": "kubernetes.container_logs",
      "dataset": [
        "{{kubernetes.annotations.elastic_co/dataset}}",
        "{{data_stream.dataset}}"
      ],
      "namespace": [
        "{{kubernetes.annotations.elastic_co/namespace}}",
        "{{data_stream.namespace}}"
      ],
      "if": "ctx.kubernetes?.annotations != null"
    }
  }
]

While the 'json.message --> message' Set works, the 'json.service.name --> service.name' and 'json.log.level --> log.level' do not. The result in Kibana are empty fields. What am I doing wrong? I've looked at the dot separator processor, but couldn't get that to work either.

Edit1: This is an example of the logs being produced by Logback:

{"@timestamp":"2023-10-02T10:17:47.798Z","log.level": "INFO","message":"Starting Application v1.0.0 using Java 17.0.8 on my-component-15-7hbhs with PID 1 (/deployments/my-component-1.0.0.jar started by 1002400000 in /deployments)","ecs.version": "1.2.0","service.name":"my-component","event.dataset":"my-component","process.thread.name":"main","log.logger":"com.example.Application"}

This is what ends up in the JSON object in the Elastic document.


Solution

  • Indeed, your fields contain a dot in their name, so if they are located inside a json structure, then you cannot reach them using the dotted notation.

    However, you can add a dot_expander processor in order to expand the field names before processing them, as shown below:

    [
      {
        "dot_expander": {
          "field": "log.level",
          "path": "json"
        }
      },
      {
        "set": {
          "field": "log.level",
          "ignore_empty_value": true,
          "copy_from": "json.log.level"
        }
      },
      {
        "set": {
          "field": "message",
          "copy_from": "json.message",
          "ignore_empty_value": true
        }
      },
      {
        "dot_expander": {
          "field": "service.name",
          "path": "json"
        }
      },
      {
        "set": {
          "field": "service.name",
          "ignore_empty_value": true,
          "copy_from": "json.service.name"
        }
      },
      {
        "set": {
          "field": "service.version",
          "copy_from": "kubernetes.labels.app_kubernetes_io/version",
          "ignore_empty_value": true
        }
      }
    ]
    

    It would be better to configure Filebeat or whatever agent you're using to expand the field names, instead of having to do it in your ingest pipeline.