Search code examples
jsonlogstashelastic-stacklogstash-configuration

Logstash: Flatten nested JSON, combine fields inside array


I have a JSON looking like this:

{
  "foo": {
    "bar": {
      "type": "someType",
      "id": "ga241ghs"
    },
    "tags": [
      {
        "@tagId": "123",
        "tagAttributes": {
          "attr1": "AAA",
          "attr2": "111"
        }
      },
      {
        "@tagId": "456",
        "tagAttributes": {
          "attr1": "BBB",
          "attr2": "222"
        }
      }
    ]
  },
  "text": "My text"
}

Actually it's not split to multiple lines (just did it to give a better overview), so it's looking like this:

{"foo":{"bar":{"type":"someType","id":"ga241ghs"},"tags":[{"@tagId":"123","tagAttributes":{"attr1":404,"attr2":416}},{"@tagId":"456","tagAttributes":{"attr1":1096,"attr2":1103}}]},"text":"My text"}

I want to insert this JSON with Logstash to an Elasticsearch index. However, I want to insert a flattened JSON with the fields in the array combined like this:

"foo.bar.tags.tagId": ["123", "456"]
"foo.tags.tagAttributs.attr1": ["AAA", "BBB"]
"foo.tags.tagAttributs.attr2": ["111", "222"]

In total, the data inserted to Elasticsearch should look like this:

"foo.bar.type": "someType"
"foo.bar.id": "ga241ghs"
"foo.tags.tagId": ["123", "456"]
"foo.tags.tagAttributs.attr1": ["AAA", "BBB"]
"foo.tags.tagAttributs.attr2": ["111", "222"]
"foo.text": "My text"

This is my current Logstash .conf; I am able to split the "tags" array, but now I am getting 2 entries as a result.

How can I now join all tagIds to one field, attr1 values of the array to one field, and all attr2 values to another?

input {
  file {
    codec => json
    path => ["/path/to/my/data/*.json"]
    mode => "read"
    file_completed_action => "log"
    file_completed_log_path => ["/path/to/my/logfile"]
    sincedb_path => "/dev/null"
  }
}

filter {
  split {
    field => "[foo][tags]"
  }
}

output {
  stdout { codec => rubydebug }
}

Thanks a lot!


Solution

  • Figured it out how to do it with a Ruby filter directly in Logstash - for all searching for this in future, here is one example on how to do it for @tagId:

    filter {
            ruby { code => '
                i = 0
                tagId_array = Array.new
                while i < event.get( "[foo][tags]" ).length do
                    tagId_array = tagId_array.push(event.get( "[foo][tags][" + i.to_s + "][@tagId]" ))
                    i += 1
                    end
                event.set( "foo.tags.tagId", tagId_array )
            '
            }
    }