Search code examples
logstashelastic-stacklogstash-groklogstash-configuration

_grokparsefailure Tag in all parsed logs with multiple grok filter


I am trying to parse minecraft log with Elastic Stack and I'v faced a very strange problem (likely strange for me!)

all line of my log get parsed correctly but I got _grokparsefailure tag in everyone of them.

my logstash pipeline config is this:

input {
  file {
    path => [ "/path/to/my/log" ]
    #start_position => "beginning"
    tags => ["minecraft"]
  }
}

filter {
  if "minecraft" in [tags] {

#    mutate {
#      gsub => [
#        "message", "\\n", ""
#      ]
#    }



    #############################
    #           Num 1           #
    #############################
    grok {
      match => [ "message", "\[%{TIME:timestamp}] \[(?<originator>[^\/]+)?/%{LOGLEVEL:level}]: %{GREEDYDATA:message}" ]
      overwrite => [ "message" ]
      break_on_match => false
    }


    #############################
    #           Num 2           #
    #############################
    grok {
      match => [ "message", "UUID of player %{USERNAME} is %{UUID}" ]
      add_tag => [ "player", "uuid" ]
      break_on_match => true
    }


    #############################
    #           Num 3           #
    #############################
    grok {
      match => [ "message",  "\A(?<player>[a-zA-Z0-9_]+)\[/%{IPV4:ip_address}:%{POSINT}\] logged in with entity id %{POSINT:entity_id} at \(\[(?<world>[a-zA-Z]+)\](?<pos>[^\)]+)\)\Z" ]
      add_tag => [ "player", "join" ]
      break_on_match => true
    }
#
#    grok {
#      match => [ "message",  "^(?<player>[a-zA-Z0-9_]+) has just earned the achievement \[(?<achievement>[^\[]+)\]$" ]
#      add_tag => [ "player", "achievement" ]
#    }
#
#    grok {
#      match => [ "message", "^(?<player>[a-zA-Z0-9_]+) left the game$" ]
#      add_tag => [ "player", "part" ]
#    }
#
#    grok {
#      match => [ "message", "^<(?<player>[a-zA-Z0-9_]+)> .*$" ]
#      add_tag => [ "player", "chat" ]
#    }
  }
}

output {
        elasticsearch {
                hosts => ["elasticsearch:xxxx"]
                user => "xxxx"
                password => "xxxxxx"
        index => "minecraft_s1v15_%{+YYYY.MM.dd}"
        }
}

AND A SAMPLE OF MY LOG IS:

[11:21:46] [User Authenticator #7/INFO]: UUID of player MyAwsomeUsername is d800b63e-c2d2-3140-83a7-32315d09feca
[11:21:46] [Server thread/INFO]: MyAwsomeUsername joined the game
[11:21:46] [Server thread/INFO]: MyAwsomeUsername[/111.111.111.111:45140] logged in with entity id 6868 at ([world]61.45686149445207, 70.9375, -175.44700729217607)
[11:21:49] [Server thread/INFO]: MyAwsomeUsername issued server command: //efererg
[11:21:52] [Async Chat Thread - #1/INFO]: <MyAwsomeUsername> egerg
[11:21:54] [Async Chat Thread - #1/INFO]: <MyAwsomeUsername> ef
[12:00:19] [Server thread/INFO]: MyAwsomeUsername lost connection: Disconnected
[12:00:19] [Server thread/INFO]: MyAwsomeUsername left the game
[12:00:21] [User Authenticator #8/INFO]: UUID of player MyAwsomeUsername is d800b63e-c2d2-3140-83a7-32315d09feca
[12:00:21] [Server thread/INFO]: MyAwsomeUsername joined the game
[12:00:21] [Server thread/INFO]: MyAwsomeUsername[/111.111.111.111:45470] logged in with entity id 11767 at ([world]61.45686149445207, 70.9375, -175.44700729217607)
[12:00:27] [Server thread/INFO]: MyAwsomeUsername issued server command: /wgergerger
[12:00:29] [Async Chat Thread - #2/INFO]: <MyAwsomeUsername> gerg
[12:00:33] [Async Chat Thread - #2/INFO]: <MyAwsomeUsername> gerger
[12:00:35] [Async Chat Thread - #2/INFO]: <MyAwsomeUsername> rerg
[12:00:37] [Server thread/INFO]: MyAwsomeUsername lost connection: Disconnected
[12:00:37] [Server thread/INFO]: MyAwsomeUsername left the game
[12:00:38] [User Authenticator #8/INFO]: UUID of player MyAwsomeUsername is d800b63e-c2d2-3140-83a7-32315d09feca
[12:00:38] [Server thread/INFO]: MyAwsomeUsername joined the game
[12:00:38] [Server thread/INFO]: MyAwsomeUsername[/111.111.111.111:45476] logged in with entity id 11793 at ([world]62.97573252632079, 71.0, -179.01739415148737)
[12:00:40] [Server thread/INFO]: MyAwsomeUsername lost connection: Disconnected
[12:00:40] [Server thread/INFO]: MyAwsomeUsername left the game
[12:00:51] [User Authenticator #8/INFO]: UUID of player MyAwsomeUsername is d800b63e-c2d2-3140-83a7-32315d09feca
[12:00:51] [Server thread/INFO]: MyAwsomeUsername joined the game
[12:00:51] [Server thread/INFO]: MyAwsomeUsername[/111.111.111.111:45486] logged in with entity id 11805 at ([world]62.97573252632079, 71.0, -179.01739415148737)
[12:00:55] [Server thread/INFO]: MyAwsomeUsername lost connection: Disconnected
[12:00:55] [Server thread/INFO]: MyAwsomeUsername left the game


Explanation:

I commented other grok to explain the problem simpler (Exact same problem when uncommnet them)

I'v tested 3 situation:

  1. Comment 2 and 3 as well as others and only 1 was active, in this case every line of log got parsed without any _grokparsefailure in the record.
  2. Only was commented as well as others and 1 and 2 were active. in this case the line of log with match the grok number 2 was parsed with no _grokparsefailure and others get _grokparsefailure. this is still make sense!
  3. In last situation I uncommented all 3 grok (1, 2, 3 were active) and every line of log get parsed BUT with _grokparsefailure in it! even though the break_on_match is true by default and when it match by grok 2 it shouldn't be tested with grok 3.

I've read some other question similar to me in stackoverflow: Similar Question 1 and I added the mutate block before the grok filters (cause every line of the log end with \n) but nothing changed and the problem persist!

other thing I think I need to mention is the fact that I know adding more grok beside grok 2 (3 and others) cause this tag cause some of the logs don't match grok 2 at all and have to wrap them with a regex. but for now at lease the logs that match grok 2 should be Ok (no _grokparsefailure) but they are not! (Read it in a stackoverflow question: Similar Question 2


Solution

  • In fact, this is the expected behavior, you are confusing a little the way logstash and grok works.

    First, all filters are independent from each other, using break_on_match in a grok only affects that grok, it makes no difference for other grok filters that appears after that in your pipeline. The break_on_match also only makes sense when you have more than one pattern in the same grok, which is not your case.

    Second, since Logstash is serial and you are not using any conditional, your grok filters will be applied to every message in your pipeline, it doesn't matter if it was already parsed, this is what is making your lines to get the _grokparsefailure

    To fix that you need to use conditionals.

    You don't need conditionals in your two first grok filters, the first one is just taking the different part of your log lines and overwritting into the message field, the second will be just your first test, for every grok after the second one you will need the following configuration.

    if "_grokparsefailure" in [tags] {
      grok {
        match => "your pattern"
        add_tag => "your tags"
        remove_tag => ["_grokparsefailure"]
      }
    }
    

    This grok will only be applied if the message has _grokparsefailure in the tags field, if the message matchs your pattern, this tag will be removed, if it does not match, the tag remains and the message can be tested by the following groks.

    In the end your grok config should look something like this.

    grok {
      "your first grok"
    }
    
    grok {
      "your second grok, can be any of the others"
    }
    
    if "_grokparsefailure" in [tags] {
      grok {
        "your grok N"
        remove_tag => ["_grokparsefailure"]
      }
    }
    

    This is only needed because you are adding different tags for every message, if you move this logic to a mutate filter for example, you can use only two grok filters, your second one would be a multi-pattern grok, with break_on_match set to true.

    grok {
      match => { 
        "message" => [ 
          "pattern from grok 2",
          "pattern from grok 3",
          "pattern from grok N"
        ]
      }
      break_on_match => true
    }