Search code examples
xmlelasticsearchxml-parsinglogstashelastic-stack

xml filter on nested object using ruby


I have below xml format log file

<QuerySiteInformation>
    xmlns="http://www.example.com"
    <Site>
        <id>abc-cde-fvvvv</id>
        <Item>
            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>
            <code>67448833344443</code>
            <objectMessage>Internal> message shown here in multiple lines</objectMessage>
            <reference>/</reference>
        </Item>
    </Site>
    <SiteInteraction>
        <InteractionItem>
            <Location>
                <id>8496940--2842047577555</id>
                <objectMessage>Internal> message shown here in multiple lines</objectMessage>
            </Location>
        </InteractionItem>
    </SiteInteraction>
</QuerySiteInformation>

I am wanting to mutate the xml tag <objectMessage>message in multiples lines</objectMessage> into <objectMessage>MESSAGE HAS BEEN REMOVED</objectMessage> ONLY when <objectMessage> tag is inside <Item> tag

I have below part of the config which can look through and mutate the xml into the the message that i want

<objectMessage>Internal> message shown here in multiple lines</objectMessage>

config

filter {
 mutate {
  gsub => [
    "some regex pattern can do the xml tag filtering", "MESSAGE HAS BEEN REMOVED"

   ]
 }
}

However, this will change all the <objectMessage> message shown here in multiple lines</objectMessage> including the one outside of <Item> field

I know using ruby plugin can do a better job and shouldn't be using regex for xml parsing at all. but this is the closest i can land on so far.


Solution

  • Ideally you want to use the built in xml filter plugin, it is way more reliable and maintanable:

    https://www.elastic.co/guide/en/logstash/current/plugins-filters-xml.html

    The following conf file will parse the XML and replace the values for the inner object:

    input {
        generator {
            lines => [
            '<QuerySiteInformation>
                xmlns="http://www.example.com"
                <Site>
                <id>abc-cde-fvvvv</id>
                <Item>
                <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>
                <code>67448833344443</code>
                <objectMessage>Internal> message shown here in multiple lines</objectMessage>
                <reference>/</reference>
                </Item>
                <Item>
                <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>
                <code>67448833344443</code>
                <objectMessage>Internal> message shown here in multiple lines</objectMessage>
                <reference>/</reference>
                </Item>
                </Site>
                <SiteInteraction>
                <InteractionItem>
                <Location>
                    <id>8496940--2842047577555</id>
                    <objectMessage>Internal> message shown here in multiple lines</objectMessage>
                </Location>
                </InteractionItem>
                </SiteInteraction>
            </QuerySiteInformation>'
            ]
            count => 1
        }
    }
    
    filter {
        xml {
            source => "message"
            target => "xml"
            store_xml => true
            remove_field => ["message"]
        }
    }
    
    filter {
      ruby {
        code => '
          event.get("[xml][Site][0][Item]").each_with_index do |item, index|
            event.set("[xml][Site][0][Item][#{index}]", "REMOVED MESSAGE")
          end 
        '
      }
    }
    
    output {
        stdout {
            codec => rubydebug
        }
    }
    
    

    Output:

    {
              "host" => {
            "name" => "Mac-Studio.local"
        },
          "@version" => "1",
        "@timestamp" => 2022-11-28T13:47:31.352282Z,
             "event" => {
            "original" => "<QuerySiteInformation>\n            xmlns=\"http://www.example.com\"\n            <Site>\n            <id>abc-cde-fvvvv</id>\n            <Item>\n            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>\n            <code>67448833344443</code>\n            <objectMessage>Internal> message shown here in multiple lines</objectMessage>\n            <reference>/</reference>\n            </Item>\n            <Item>\n            <id>e5753ead-d202-451e-92cc-ea49d0a6bdf5</id>\n            <code>67448833344443</code>\n            <objectMessage>Internal> message shown here in multiple lines</objectMessage>\n            <reference>/</reference>\n            </Item>\n            </Site>\n            <SiteInteraction>\n            <InteractionItem>\n            <Location>\n                <id>8496940--2842047577555</id>\n                <objectMessage>Internal> message shown here in multiple lines</objectMessage>\n            </Location>\n            </InteractionItem>\n            </SiteInteraction>\n        </QuerySiteInformation>",
            "sequence" => 0
        },
               "xml" => {
                    "content" => [
                [0] "\n            xmlns=\"http://www.example.com\"\n            ",
                [1] "\n            ",
                [2] "\n        "
            ],
                       "Site" => [
                [0] {
                      "id" => [
                        [0] "abc-cde-fvvvv"
                    ],
                    "Item" => [
                        [0] "REMOVED MESSAGE",
                        [1] "REMOVED MESSAGE"
                    ]
                }
            ],
            "SiteInteraction" => [
                [0] {
                    "InteractionItem" => [
                        [0] {
                            "Location" => [
                                [0] {
                                               "id" => [
                                        [0] "8496940--2842047577555"
                                    ],
                                    "objectMessage" => [
                                        [0] "Internal> message shown here in multiple lines"
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    }