Search code examples
htmlelasticsearchlogstashlogstash-grok

Remove HTML markup from logstash input


I am struggling with data manipulation in logstash version 5.1, where some of the data come from open text fields with HTML markups. Most of the time it comes with only one marker, like this:

<.p> XYZ <./p>

and I am dealing with it using Grok.

but when it comes like this:

<.p><.b><.strong> XYZ <./strong><./b><./p>

simple grok cant filter it out.

My question is if there is a built-in filter for HTML markup or do i have to develop my own using regular expressions? Or do you know if in versions prior 5.1 is it possible ?


Solution

  • To remove the HTML, you can use this:

    mutate {  
      gsub => [
        "fieldname", "<.*?>", ""
      ]
    }