Search code examples
jsonelasticsearchlogstashlogstash-configuration

Logstash - import nested JSON into Elasticsearch


I have a large amount (~40000) of nested JSON objects I want to insert into elasticsearch an index.

The JSON objects are structured like this:

    {
    "customerid": "10932"
    "date": "16.08.2006",
    "bez": "xyz",
    "birthdate": "21.05.1990",
    "clientid": "2",
    "address": [
        {
            "addressid": "1",
            "tile": "Mr",
            "street": "main str",
            "valid_to": "21.05.1990",
            "valid_from": "21.05.1990",
        },
        {
            "addressid": "2",
            "title": "Mr",
            "street": "melrose place",
            "valid_to": "21.05.1990",
            "valid_from": "21.05.1990",
        }
      ]
    }

So a JSON field (address in this example) can have an array of JSON objects.

What would a logstash config look like to import JSON files/objects like this into elasticsearch? The elasticsearch mapping for this index should just look like the structure of the JSON. The elasticsearch document id should be set to customerid.

input {
  stdin {
    id => "JSON_TEST"
  } 
}
filter {
    json{
        source => "customerid"
        ....
        ....    
    }

}
output {
       stdout{}
       elasticsearch {
          hosts => "https://localhost:9200/"
          index => "customers"           
          document_id => "%{customerid}"
       }                                               
}

Solution

  • If you have control of what's being generated, the easiest thing to do is to format you input as single line json and then use the json_lines codec.

    Just change your stdin to:

    stdin { codec => "json_lines" }
    

    and then it'll just work:

    cat input_file.json | logstash -f json_input.conf
    

    where input_file.json has lines like:

    {"customerid":1,"nested": {"json":"here"}}
    {"customerid":2,"nested": {"json":"there"}}
    

    and then you won't need the json filter.