Flatten nested JSON using fluentd

I have a program that writes structured logs, and the following example applies:

{
    "time": "time_val",
    "log": "{
        \"field1\": \"value1\",
        \"field2\": \"value2\",
        \"field3\": \"{
            \"nested_field1\": \"value1\",
            \"nested_field2\": \"value2\",
            \"nested_field3\": \"value3\"
        }\"
    }"
}

I am using fluentd to tail the output of the container, and parse JSON messages, however, I would like to parse the nested structured logs, so they are flattened in the original message. For the example, I would want fluentd to eventually consider the message as:

{
    "time": "time_val",
    "field1": "value1",
    "field2": "value2",
    "nested_field1": "value1",
    "nested_field2": "value2",
    "nested_field3": "value3"
}

Is this something that can be done using fluentd configuration? Changing the original program behavior is not an option in my case.

Solution

You can use the parser filter plugin with its key_name, reserve_data, and remove_key_name_field.

Example:

<filter **>
  @type parser
  key_name field3
  reserve_data true
  remove_key_name_field true
  <parse>
    @type json
  </parse>
</filter>

Here is the complete working example after making your JSON valid i.e.:

{"field1":"value1","field2":"value2","field3":"{\"nested_field1\":\"value1\",\"nested_field2\":\"value2\",\"nested_field3\":\"value3\"}"}

fluent-flatten-json.conf

<source>
  @type forward
</source>

<filter **>
  @type parser
  key_name field3
  reserve_data true
  remove_key_name_field true
  <parse>
    @type json
  </parse>
</filter>

<match **>
  @type stdout
</match>

Run fluentd:

fluentd -c ./fluent-flatten-json.conf

From another terminal, run fluent-cat with input JSON:

fluent-cat test <<< '{"field1":"value1","field2":"value2","field3":"{\"nested_field1\":\"value1\",\"nested_field2\":\"value2\",\"nested_field3\":\"value3\"}"}'

Output in fluentd logs:

{"field1":"value1","field2":"value2","nested_field1":"value1","nested_field2":"value2","nested_field3":"value3"}

Formatted output:

{
  "field1": "value1",
  "field2": "value2",
  "nested_field1": "value1",
  "nested_field2": "value2",
  "nested_field3": "value3"
}

UPDATE

For a double-nested valid raw escaped JSON:

{"time":"time_val","log":"{\"field1\":\"value1\",\"field2\":\"value2\",\"field3\":\"{\\\"nested_field1\\\":\\\"nested_value1\\\",\\\"nested_field2\\\":\\\"nested_value2\\\",\\\"nested_field3\\\":\\\"nested_value3\\\"}\"}"}

The double-nested JSON in the question is not valid. I had to recreate it. See here.

The following should work:

<filter **>
  @type parser
  key_name log
  reserve_data true
  remove_key_name_field true
  <parse>
    @type json
  </parse>
</filter>

<filter **>
  @type parser
  key_name field3
  reserve_data true
  remove_key_name_field true
  <parse>
    @type json
  </parse>
</filter>