Search code examples
mongodbelasticsearchjdbclogstash

Correct syntax for document_id in Logstash Elasticsearch Output


So I'm trying to use logstash to move data from MongoDB to elasticsearch. I don't want duplicates to be written, so I'm using the doc_as_upsert => true along with the document_id parameters in the output. This is my config file for logstash

input {
  jdbc{
    jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
    jdbc_driver_library => "/path/to/mongojdbc1.8.jar"
    jdbc_user => ""
    jdbc_password => ""
    jdbc_connection_string => "jdbc:mongodb://127.0.0.1:27017/db1"
    statement => "db1.coll1.find({ },{'_id': false})"
  }
}

output {
  elasticsearch {
    hosts => ["http://127.0.0.1:9200"]
    index => "test"
    user => ""
    password => ""
    doc_as_upsert => true
    document_id => "%{datetime}"
  }
}

As you can see, I'm trying to use the datetime field of the MongoDB document (which is a string) as the document id for elasticsearch. But this is what a document inserted into Elasticsearch looks like:

{
    "_index" : "test",
    "_type" : "_doc",
    "_id" : "%{datetime}",
    "_score" : 1.0,
    "_source" : {
        "@timestamp" : "2020-05-28T08:53:28.244Z",
        "document" : {
            # .. some fields ..
            "datetime" : "2020-05-28 14:22:29.133363",
            # .. some fields ..
        },
        "@version" : "1"
    }
}

Instead of the value of the datetime field being used as the _id, the string %{datetime} is being used as the ID. How do I fix this?


Solution

  • The document_id field is not at the root level, so you need to change your syntax to the following:

    document_id => "%{[document][datetime]}"