elasticsearch logstash amazon-elastic-beanstalk filebeat

Filtering Filebeat input with or without Logstash

In our current setup we use Filebeat to ship logs to an Elasticsearch instance. The application logs are in JSON format and it runs in AWS.

For some reason AWS decided to prefix the log lines in a new platform release, and now the log parsing doesn't work.

Apr 17 06:33:32 ip-172-31-35-113 web: {"@timestamp":"2020-04-17T06:33:32.691Z","@version":"1","message":"Tomcat started on port(s): 5000 (http) with context path ''","logger_name":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer","thread_name":"main","level":"INFO","level_value":20000}

Before it was simply:

{"@timestamp":"2020-04-17T06:33:32.691Z","@version":"1","message":"Tomcat started on port(s): 5000 (http) with context path ''","logger_name":"org.springframework.boot.web.embedded.tomcat.TomcatWebServer","thread_name":"main","level":"INFO","level_value":20000}

The question would be whether we can avoid using Logstash to convert the log lines into the old format? If not, how do I drop the prefix? Which filter is the best choice for this?

My current Filebeat configuration looks like this:

 filebeat.inputs:
  - type: log
    paths:
    - /var/log/web-1.log
    json.keys_under_root: true
    json.ignore_decoding_error: true
    json.overwrite_keys: true
    fields_under_root: true
    fields:
      environment: ${ENV_NAME:not_set}
      app: myapp

  cloud.id: "${ELASTIC_CLOUD_ID:not_set}"
  cloud.auth: "${ELASTIC_CLOUD_AUTH:not_set}"

Solution

I would try to leverage the dissect and decode_json_fields processors:

processors:
  # first ignore the preamble and only keep the JSON data
  - dissect:
      tokenizer: "%{?ignore} %{+ignore} %{+ignore} %{+ignore} %{+ignore}: %{json}"
      field: "message"
      target_prefix: ""

  # then parse the JSON data
  - decode_json_fields:
      fields: ["json"]
      process_array: false
      max_depth: 1
      target: ""
      overwrite_keys: false
      add_error_key: true