Search code examples
logstashlogstash-grok

logstash - remove all non digit characters from field


I have log files that I am passing into logstash to be modified before pushing to elasticsearch.

One of the fields that I have sometimes appears as a series of digits

foobar = 42

Sometimes it is prefixed with letters

foobar = ws-42

I want to make sure the field is always an integer, and if any non-digits are present, that they are removed.

Here is part of the logstash config which makes sure the field is an integer

filter {
  mutate {
    convert => [ "foobar", "integer"]
  }
}

How can I strip out the characters if present?

Update

By using the mutate filter I can either strip out non numerical values, or I can convert to integers. However if I try and do both, it returns 0.

Example

input {
  stdin {}
}

filter {
  kv { }
  mutate {
    gsub => [ "foobar", "\D", "" ]
    convert => [ "foobar", "integer" ]
  }
}

Here is the output. Notice that if '42' is provided, then foobar returns an integer of 42, however if you provide 'sw-42' foobar returns 0

foobar="42"
{
       "message" => "foobar=\"42\"",
      "@version" => "1",
    "@timestamp" => "2015-03-31T22:32:11.718Z",
          "host" => "swat-logstash02",
        "foobar" => 42
}
foobar="sw-42"
{
       "message" => "foobar=\"sw-42\"",
      "@version" => "1",
    "@timestamp" => "2015-03-31T22:32:23.822Z",
          "host" => "swat-logstash02",
        "foobar" => 0
}

Solution

  • It's a scoping issue.

    If you do just the gsub (without the convert), it shows that the regexp is working:

    {
           "message" => "foobar=\"sw-42\"",
          "@version" => "1",
        "@timestamp" => "2015-03-31T22:42:40.097Z",
              "host" => "0.0.0.0",
            "foobar" => "42"
    }
    

    so you should run it as two stanzas:

    filter {
      kv { }
      mutate {
        gsub => [ "foobar", "\D", "" ]
      }
      mutate {
        convert => [ "foobar", "integer" ]
      }
    }