Search code examples
elasticsearchserilogfilebeat

Why does Elasticsearch ignore_malformed add malformed value to index?


I am using Serilog in C# to create a log file, which is ingested by Filebeat and sent via Logstash to Elasticsearch. The Elasticsearch indexes conform to ECS 1.5.

The log file sometimes contains erroneous values for the field "host.ip", it can contain values like "localhost:5000". This lead to rejected log posts, since a string like that cannot be converted into an ip number. This is all expected, and the issue of correcting the log file is not in the scope of this question.

I decided to add the "ignore_malformed: true" setting, on the index level. After that, the log posts are no longer rejected - I can find them in Elasticsearch. So, the setting is proven to have had effect. BUT the field "host.ip" now actually contains the malformed value "localhost:5000". I can't see how that is even possible, it is not what I expected or wanted.

enter image description here

From the documetation of "ignore_malformed", it would appear as if values that do not match the field type are supposed to be discarded - not written into the field. I also find no added "_ignored" field.

It's as if setting ignore_malformed to true actually allows the malformed data into the index, instead of dropping it. I'm expecting/wanting the field to be empty, if the value is malformed. Is this a bug, or am I missing something?


Solution

  • Whatever you send in the source document will always be there, ES will never modify it. However, the fact that you're now specifying ignore_malformed means that ES will not try to index malformed data, but the value will still be visible in your source document.