So I'm trying to use logstash to move data from MongoDB to elasticsearch. I don't want duplicates to be written, so I'm using the doc_as_upsert => true
along with the document_id
parameters in the output. This is my config file for logstash
input {
jdbc{
jdbc_driver_class => "com.dbschema.MongoJdbcDriver"
jdbc_driver_library => "/path/to/mongojdbc1.8.jar"
jdbc_user => ""
jdbc_password => ""
jdbc_connection_string => "jdbc:mongodb://127.0.0.1:27017/db1"
statement => "db1.coll1.find({ },{'_id': false})"
}
}
output {
elasticsearch {
hosts => ["http://127.0.0.1:9200"]
index => "test"
user => ""
password => ""
doc_as_upsert => true
document_id => "%{datetime}"
}
}
As you can see, I'm trying to use the datetime field of the MongoDB document (which is a string) as the document id for elasticsearch. But this is what a document inserted into Elasticsearch looks like:
{
"_index" : "test",
"_type" : "_doc",
"_id" : "%{datetime}",
"_score" : 1.0,
"_source" : {
"@timestamp" : "2020-05-28T08:53:28.244Z",
"document" : {
# .. some fields ..
"datetime" : "2020-05-28 14:22:29.133363",
# .. some fields ..
},
"@version" : "1"
}
}
Instead of the value of the datetime field being used as the _id, the string %{datetime} is being used as the ID. How do I fix this?
The document_id
field is not at the root level, so you need to change your syntax to the following:
document_id => "%{[document][datetime]}"