Search code examples
elasticsearchkuberneteslogstashkubernetes-helmfluent-bit

Elasticsearch Dynamic Field Mapping and JSON Dot Notation


I'm trying to write logs to an Elasticsearch index from a Kubernetes cluster. Fluent-bit is being used to read stdout and it enriches the logs with metadata including pod labels. A simplified example log object is

{
  "log": "This is a log message.",
  "kubernetes": {
    "labels": {
      "app": "application-1"
    }
  }
}

The problem is that a few other applications deployed to the cluster have labels of the following format:

{
  "log": "This is another log message.",
  "kubernetes": {
    "labels": {
      "app.kubernetes.io/name": "application-2"
    }
  }
}

These applications are installed via Helm charts and the newer ones are following the label and selector conventions as laid out here. The naming convention for labels and selectors was updated in Dec 2018, seen here, and not all charts have been updated to reflect this.

The end result of this is that depending on which type of label format makes it into an Elastic index first, trying to send the other type in will throw a mapping exception. If I create a new empty index and send in the namespaced label first, attempting to log the simple app label will throw this exception:

object mapping for [kubernetes.labels.app] tried to parse field [kubernetes.labels.app] as object, but found a concrete value

The opposite situation, posting the namespaced label second, results in this exception:

Could not dynamically add mapping for field [kubernetes.labels.app.kubernetes.io/name]. Existing mapping for [kubernetes.labels.app] must be of type object but found [text].

What I suspect is happening is that Elasticsearch sees the periods in the field name as JSON dot notation and is trying to flesh it out as an object. I was able to find this PR from 2015 which explicitly disallows periods in field names however it seems to have been reversed in 2016 with this PR. There is also this multi-year thread from 2015-2017 discussing this issue but I was unable to find anything recent involving the latest versions.

My current thoughts on moving forward is to standardize the Helm charts we are using to have all of the labels use the same convention. This seems like a band-aid on the underlying issue though which is that I feel like I'm missing something obvious in the configuration of Elasticsearch and dynamic field mappings.

Any help here would be appreciated.


Solution

  • I opted to use the Logstash mutate filter with the rename option as described here:

    https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-rename

    The end result looked something like this:

    filter {
      mutate {
        '[kubernetes][labels][app]'   => '[kubernetes][labels][app.kubernetes.io/name]'
        '[kubernetes][labels][chart]' => '[kubernetes][labels][helm.sh/chart]'
      }
    }