I am using Serilog in C# to create a log file, which is ingested by Filebeat and sent via Logstash to Elasticsearch. The Elasticsearch indexes conform to ECS 1.5.
The log file sometimes contains erroneous values for the field "host.ip", it can contain values like "localhost:5000". This lead to rejected log posts, since a string like that cannot be converted into an ip number. This is all expected, and the issue of correcting the log file is not in the scope of this question.
I decided to add the "ignore_malformed: true" setting, on the index level. After that, the log posts are no longer rejected - I can find them in Elasticsearch. So, the setting is proven to have had effect. BUT the field "host.ip" now actually contains the malformed value "localhost:5000". I can't see how that is even possible, it is not what I expected or wanted.
From the documetation of "ignore_malformed", it would appear as if values that do not match the field type are supposed to be discarded - not written into the field. I also find no added "_ignored" field.
It's as if setting ignore_malformed to true actually allows the malformed data into the index, instead of dropping it. I'm expecting/wanting the field to be empty, if the value is malformed. Is this a bug, or am I missing something?
Whatever you send in the source document will always be there, ES will never modify it. However, the fact that you're now specifying ignore_malformed
means that ES will not try to index malformed data, but the value will still be visible in your source document.