json elasticsearch logging kubernetes fluentd

How do I get fluentd / elasticsearch to parse the "log" key_name as json from a kubernetes pod?

I am having issues trying to get logs into elasticsearch from fluentd in a k8s cluster.

I have several web applications which output their logs as json. Using a simple setup locally with docker containers I can get elastic to read and parse the logs correctly. Here is a sample of the local json shown in kibana:

{
  "_index": "logstash-2020.01.17",
  "_type": "fluentd",
  "_id": "S620sm8B2LEvFR841ylg",
  "_version": 1,
  "_score": null,
  "_source": {
    "log": {
      "@timestamp": "2020-01-17T08:53:03.066290",
      "caller": "StaticFileHelper.py::get",
      "data": {
        "python.thread": "Thread-1[{record.thread}]",
        "python.lineno": 45,
        "python.filename": "StaticFileHelper.py",
        "python.logger_name": "root",
        "python.module": "StaticFileHelper",
        "python.funcName": "get",
        "python.pid": 11239
      },
      "message": "application/javascript",
      "level": "INFO"
    },
    "@timestamp": "2020-01-17T08:53:03.000000000+00:00"
  },
  "fields": {
    "@timestamp": [
      "2020-01-17T08:53:03.000Z"
    ],
    "log.@timestamp": [
      "2020-01-17T08:53:03.066Z"
    ]
  },
  "sort": [
    1579251183000
  ]
}

Under the index patterns I can see the correct mappings. The mappings update when I introduce new fields into the logging output. Sample of output here:

log.@timestamp: date
log.caller: string
log.caller.keyword: string
log.data.python.filename: string
log.data.python.filename.keyword: string
log.data.python.funcName :string

In the cluster the "log" field is not being parsed correctly:

{
  "_index": "logstash-2020.01.17",
  "_type": "fluentd",
  "_id": "atUDs28BFgXM_nqQvYUY",
  "_version": 1,
  "_score": null,
  "_source": {
    "log": "{'@timestamp': '2020-01-17T10:19:21.775339', 'caller': 'RequestLoggingManager.py::print_request_id', 'data': {'python.thread': 'MainThread[{record.thread}]', 'python.lineno': 28, 'python.filename': 'RequestLoggingManager.py', 'python.logger_name': 'root', 'python.module': 'RequestLoggingManager', 'python.funcName': 'print_request_id', 'request_id': '1579256361-1494-XYyVj', 'python.pid': 8}, 'message': 'request: \"1579256361-1497-JUeYF\" is about to enter \"get_settings\"', 'level': 'INFO'}\n",
    "stream": "stderr",
    "docker": {
      "container_id": "fc5b0d5b0aa4008961b18dfe93c4e04b2cfbde0f7ff072dc702c55823baba3a4"
    },
    "kubernetes": {
      "container_name": "cms",
      "namespace_name": "default",
      "pod_name": "cms-68c4b49657-b88hs",
      "container_image": "HIDDEN",
      "container_image_id": "HIDDEN",
      "pod_id": "ffc6a681-390b-11ea-bcac-42010a8000be",
      "labels": {
        "app": "cms",
        "pod-template-hash": "68c4b49657",
        "version": "1.0.0"
      },
      "host": "HIDDEN",
      "master_url": "https://10.0.0.1:443/api",
      "namespace_id": "1ede7315-14fa-11ea-95c1-42010a80010f"
    },
    "@timestamp": "2020-01-17T10:19:21.776876369+00:00",
    "tag": "kubernetes.var.log.containers.cms-68c4b49657-b88hs_default_cms-fc5b0d5b0aa4008961b18dfe93c4e04b2cfbde0f7ff072dc702c55823baba3a4.log"
  },
  "fields": {
    "@timestamp": [
      "2020-01-17T10:19:21.776Z"
    ]
  },
  "highlight": {
    "kubernetes.labels.app": [
      "@kibana-highlighted-field@cms@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1579256361776
  ]
}

The mapping are being shown as:

log: string
log.keyword: string

None of the custom json mappings are recognised.

Is there a way of customising this "log" field and if so where do I need to make the changes? I am quite new to fluentd and elastic so any help would be appreciated!

I am using the fluent/fluentd-kubernetes-daemonset on kubernetes.

Solution

To get around this I pulled and ran the fluent/fluentd-kubernetes-daemonset:v1.4.2-debian-elasticsearch-1.1 image locally in docker. Had to make sure I specified FLUENT_ELASTICSEARCH_USER and FLUENT_ELASTICSEARCH_PASSWORD as env variables on the initial run as the entrypoint.sh tries to substitute information in the containers fluent.conf file. If you don't specify a password it wipes out the information in the file.

Then it was a case of exec'ing into the running container and adding the following information into fluent.conf:

filter **>
  @type record_transformer
  <record>
    log_json ${record["log"]}
  </record>
</filter>


<filter **>
  @type parser
  @log_level debug
  key_name log_json
  reserve_data true
  remove_key_name_field true
  emit_invalid_record_to_error false
  <parse>
    @type json
  </parse>
</filter>

After this I exited and stopped the container, committed the container to a brand new image in my own repo on Docker Hub and referenced the new image in the DaeomnSet yaml file which we were using to deploy up to k8s.

This may not be the slickest or the most efficient method but in the absence of any docs on how to customise the fluentd.conf, this did the trick for now.