I have been migrating one of the indexes from self-hosted Elasticsearch to Amazon ElasticSearch using Logstash. After successful migration what we found was some additional fields is getting added in the documents. How can we prevent it from getting added
Our Logstash config file
input {
elasticsearch {
hosts => ["https://staing-example.com:443"]
user => "userName"
password => "password"
index => "testingindex"
size => 100
scroll => "1m"
}
}
filter {
}
output {
amazon_es {
hosts => ["https://example.us-east-1.es.amazonaws.com:443"]
region => "us-east-1"
aws_access_key_id => "access_key_id"
aws_secret_access_key => "access_key_id"
index => "testingindex"
}
stdout{
codec => rubydebug
}
}
The document in our selfhosted ElasticSearch
{
"_index": "testingindex",
"_type": "interaction-3",
"_id": "38b23e7a-eafd-4163-a9f0-e2d9ffd5d2cf",
"_score": 1,
"_source": {
"customerId" : [
"e177c1f8-1fbd-4b2e-82b8-760536e42742"
],
"customProperty" : {
"messageFrom" : [
"BOT"
]
},
"userId" : [
"e177c1f8-1fbd-4b2e-82b8-760536e42742"
],
"uniqueIdentifier" : "2b027fc0-a517-49a7-a71f-8732044cb249",
"accountId" : "724bee3e-38f8-4538-b944-f3e21c518437"
}
}
The document that is in our Amazon ElasticSearch
{
"_index" : "testingindex",
"_type" : "doc",
"_id" : "B-hP020Bd2lcvg9lTyBH",
"_score" : 1.0,
"_source" : {
"customerId" : [
"e177c1f8-1fbd-4b2e-82b8-760536e42742"
],
"customProperty" : {
"messageFrom" : [
"BOT"
]
},
"@version" : "1",
"userId" : [
"e177c1f8-1fbd-4b2e-82b8-760536e42742"
],
"@timestamp" : "2019-10-16T06:44:13.154Z",
"uniqueIdentifier" : "2b027fc0-a517-49a7-a71f-8732044cb249",
"accountId" : "724bee3e-38f8-4538-b944-f3e21c518437"
}
}
@Version and @Timestamp are the new two fields are getting added in documents
Can anyone explain why it is getting added is there any other way to prevent this?
As you compare both documents the _type
and _id
also getting changed we need both _type
and _id
same as our documents in self hosted Elasticsearch
The fields @version
and @timestamp
are generated by logstash, if you don't want them you will need to use a mutate filter to remove.
mutate {
remove_fields => ["@version","@timestamp"]
}
To keep the _type
and _id
of your original documents, you will need to change your input and add the option docinfo => true
to get those fields into the @metadata
field and use them in your output, the documentation has an example for this.
input {
elasticsearch {
...
docinfo => true
}
output {
elasticsearch {
...
document_type => "%{[@metadata][_type]}"
document_id => "%{[@metadata][_id]}"
}
}
Note that if your Amazon Elasticsearch is version 6.X or higher, you can only have one document type per index, and version 7.X is typeless, also, logstash version 7.X does not have the document_type
option anymore.