Search code examples
arraysxmllogstashlogstash-configuration

Logstash split xml into array


Is it possible to convert xml into array of objects using logstash?

That'd be my sample document:

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "Metadata" : "<root><Tags><TagTypeID>1</TagTypeID><TagValue>twitter</TagValue></Tags><Tags><TagTypeID>1</TagTypeID><TagValue>facebook</TagValue></Tags><Tags><TagTypeID>2</TagTypeID><TagValue>usa</TagValue></Tags><Tags><TagTypeID>3</TagTypeID><TagValue>smartphones</TagValue></Tags></root>"
}

Ideally, I'd like to output this:

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "Metadata" : [
    {
      "TagTypeID" : "1",
      "TagValue" : "twitter"
    },
    {
      "TagTypeID" : "1",
      "TagValue" : "facebook"
    },
    {
      "TagTypeID" : "2",
      "TagValue" : "usa"
    },
    {
      "TagTypeID" : "3",
      "TagValue" : "smartphones"
    }
  ]
}

However I'm not able to achieve that. I tried using xml filter like that:

xml
{
    source => "Metadata"
    target => "Parsed"
}

However, it outputs this

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "@version" : "1",
  "@timestamp" : "2015-10-27T17:21:31.961Z",
  "Parsed" : {
    "Tags" : [
      {
        "TagTypeID" : ["1"],
        "TagValue" : ["twitter"]
      },
      {
        "TagTypeID" : ["1"],
        "TagValue" : ["facebook"]
      },
      {
        "TagTypeID" : ["2"],
        "TagValue" : ["usa"]
      },
      {
        "TagTypeID" : ["3"],
        "TagValue" : ["smartphones"]
      }
    ]
  }
}

I don't want my values to be stored as arrays (I know there's always going to be just one value there).

I know what fields are going to be brought back from my input, so I can map structure myself and this doesn't need to be dynamic (although that would be nice).

Allow splitting of lists / arrays into multiple events seemed to be useful, but it's poorly documented and I couldn't find information how to use this filter for my use-case.

Logstash, split event from an xml file in multiples documents keeping information from root tags is similar, but not exactly what I'd like to achieve.

Logstash: XML to JSON output from array to string this seems to be useful, however it hardcodes that first element of array must be outputed as single item (not part of array). It brings me back this:

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "@version" : "1",
  "@timestamp" : "2015-10-27T17:21:31.961Z",
  "Parsed" : {
    "Tags" : [
      {
        "TagTypeID" : "1",
        "TagValue" : "twitter"
      },
      {
        "TagTypeID" : ["1"],
        "TagValue" : ["facebook"]
      },
      {
        "TagTypeID" : ["2"],
        "TagValue" : ["usa"]
      },
      {
        "TagTypeID" : ["3"],
        "TagValue" : ["smartphones"]
      }
    ]
  }
}
  1. Can this be done without having to create custom filters? (I've no experience in Ruby)
  2. Or am I missing something basic here?

Solution

  • Here is one approach using logstash's builtin ruby filter.

    Filter section:

    filter {
        xml {
            source => "Metadata"
            target => "Parsed"
        }
    
        ruby {  code => "
            event['Parsed']['Tags'].each do |x|
                x.each do |key, value|
                    x[key] = value[0]
                end
            end"
        }
    }
    

    Output:

    "Parsed":{
      "Tags":[
          {
          "TagTypeID":"1",
          "TagValue":"twitter"
          },
          {
          "TagTypeID":"1",
          "TagValue":"facebook"
          },
          {
          "TagTypeID":"2",
          "TagValue":"usa"
          },
          {
          "TagTypeID":"3",
          "TagValue":"smartphones"
          }
      ]
    }
    

    If I understand you correctly this is your desired result. You need to specify the xml field inside the ruby filter: event['Parsed']['Tags']. Does it need to be more dynamic? Let me know if you need anything else.

    Can this be done without having to create custom filters? (I've no experience in Ruby)

    Well, yes and no. Yes, because this is not really a custom filter but a built-in solution. No, because I tend to say this can not be done without Ruby. I must admit that Ruby seems to be an unattractive solution. However, this is a flexible approach and 5 lines of code shouldn't hurt that much.