Search code examples
jsonloggingfluentd

Flatten nested JSON using fluentd


I have a program that writes structured logs, and the following example applies:

{
    "time": "time_val",
    "log": "{
        \"field1\": \"value1\",
        \"field2\": \"value2\",
        \"field3\": \"{
            \"nested_field1\": \"value1\",
            \"nested_field2\": \"value2\",
            \"nested_field3\": \"value3\"
        }\"
    }"
}

I am using fluentd to tail the output of the container, and parse JSON messages, however, I would like to parse the nested structured logs, so they are flattened in the original message. For the example, I would want fluentd to eventually consider the message as:

{
    "time": "time_val",
    "field1": "value1",
    "field2": "value2",
    "nested_field1": "value1",
    "nested_field2": "value2",
    "nested_field3": "value3"
}

Is this something that can be done using fluentd configuration? Changing the original program behavior is not an option in my case.


Solution

  • You can use the parser filter plugin with its key_name, reserve_data, and remove_key_name_field.

    Example:

    <filter **>
      @type parser
      key_name field3
      reserve_data true
      remove_key_name_field true
      <parse>
        @type json
      </parse>
    </filter>
    

    Here is the complete working example after making your JSON valid i.e.:

    {"field1":"value1","field2":"value2","field3":"{\"nested_field1\":\"value1\",\"nested_field2\":\"value2\",\"nested_field3\":\"value3\"}"}
    

    fluent-flatten-json.conf

    <source>
      @type forward
    </source>
    
    <filter **>
      @type parser
      key_name field3
      reserve_data true
      remove_key_name_field true
      <parse>
        @type json
      </parse>
    </filter>
    
    <match **>
      @type stdout
    </match>
    

    Run fluentd:

    fluentd -c ./fluent-flatten-json.conf
    

    From another terminal, run fluent-cat with input JSON:

    fluent-cat test <<< '{"field1":"value1","field2":"value2","field3":"{\"nested_field1\":\"value1\",\"nested_field2\":\"value2\",\"nested_field3\":\"value3\"}"}'
    

    Output in fluentd logs:

    {"field1":"value1","field2":"value2","nested_field1":"value1","nested_field2":"value2","nested_field3":"value3"}
    

    Formatted output:

    {
      "field1": "value1",
      "field2": "value2",
      "nested_field1": "value1",
      "nested_field2": "value2",
      "nested_field3": "value3"
    }
    

    UPDATE

    For a double-nested valid raw escaped JSON:

    {"time":"time_val","log":"{\"field1\":\"value1\",\"field2\":\"value2\",\"field3\":\"{\\\"nested_field1\\\":\\\"nested_value1\\\",\\\"nested_field2\\\":\\\"nested_value2\\\",\\\"nested_field3\\\":\\\"nested_value3\\\"}\"}"}
    

    The double-nested JSON in the question is not valid. I had to recreate it. See here.

    The following should work:

    <filter **>
      @type parser
      key_name log
      reserve_data true
      remove_key_name_field true
      <parse>
        @type json
      </parse>
    </filter>
    
    <filter **>
      @type parser
      key_name field3
      reserve_data true
      remove_key_name_field true
      <parse>
        @type json
      </parse>
    </filter>