Search code examples
logstashlogstash-grokgrok

get name of pattern that matched in grok in logstash


If I have a patterns file with a bunch of regex patterns such as the following

A .*foo.*
B .*bar.*
C .*baz.*

and my grok filter looks like the following:

grok {
  patterns_dir => ["/location/of/patterns"]
  match => { "request" => [ "%{A}", "%{B}", "%{C}",] 
 }
}

is there any way to know which one matched. I.e the name of the SYNTAX. I would like to annotate the document with the name of the one that matched


Solution

  • what you would usually do is name the matched variables. The syntax for that would be:

    (taking your example):

    grok {
        patterns_dir => ["/location/of/patterns"]
        match => 
        { 
            "request" => [ "%{A:A}", "%{B:NameOfB}", "%{C:SomeOtherName}",] 
        }
    }
    

    Accordingly, the matches of your grok would now be named:

    A: A

    B: NameOfB

    C: SomeOtherName

    So in your case you could just name them after the patterns. That should work just fine.

    Alternatively (I just tested that with grok debugger) it appears that if you do not name your matched pattern they will default to the name of the pattern (which I think is what you want). The downfall of this is that if you reuse your pattern, the result will be an array of values.

    This is the test I ran:

    Input:

     Caused by: com.my.application.IOException: null Caused by: com.my.application.IOException: null asd asd
    

    grok:

    (.*?)Caused by:%{GREEDYDATA}:%{GREEDYDATA}
    

    Output:

    {
      "GREEDYDATA": [
        [
          " com.my.application.IOException: null Caused by: com.my.application.IOException",
          " null asd asd"
        ]
      ]
    }
    

    Hope that solves your problems,

    Artur

    EDIT:

    Based on OP's other question here is my approach to solving that issue dynamically.

    You will still have to match the names. Decide on a common prefix on how to name your matches. I will base my example on 2 json strings to make this easier:

    {"a" : "b", "prefix_patterna" : "", "prefix_patternb" : "bla"}
    {"a" : "b", "prefix_patterna" : "sd", "prefix_patternb" : ""}
    

    Note how there are 2 artificial matches, prefix_patterna and prefix_patternb. So, I decided on the prefix "prefix" and I use that to identify which event fields to inspect. (you can grok to also drop empty events if that is something you want).

    Then in my filter, I use ruby to iterate through all events to find the one that matched my pattern:

    ruby {
        code => "
             toAdd = nil;
             event.to_hash.each { |k,v|
                  if  k.start_with?('prefix_') && v.to_s != ''
                      toAdd = k
                  end
             }
             if toAdd.to_s != ''
                 event['test'] = toAdd
             end
        "
    }
    

    All this code does is to check the event keys for the prefix, and see if the value of that field is empty or nil. If it finds the field that has a value, it writes it into a new event field called "test".

    Here are my tests:

    Settings: Default pipeline workers: 8
    Pipeline main started
    {"a" : "b", "prefix_patterna" : "sd", "prefix_patternb" : ""}
    {
                "message" => "{\"a\" : \"b\", \"prefix_patterna\" : \"sd\", \"prefix_patternb\" : \"\"}",
               "@version" => "1",
             "@timestamp" => "2016-09-15T09:48:29.418Z",
                   "host" => "pandaadb",
                      "a" => "b",
        "prefix_patterna" => "sd",
        "prefix_patternb" => "",
                   "test" => "prefix_patterna"
    }
    {"a" : "b", "prefix_patterna" : "", "prefix_patternb" : "bla"}
    {
                "message" => "{\"a\" : \"b\", \"prefix_patterna\" : \"\", \"prefix_patternb\" : \"bla\"}",
               "@version" => "1",
             "@timestamp" => "2016-09-15T09:48:36.359Z",
                   "host" => "pandaadb",
                      "a" => "b",
        "prefix_patterna" => "",
        "prefix_patternb" => "bla",
                   "test" => "prefix_patternb"
    }
    

    Note how the first test writes "prefix_patterna" while the second test writes "prefix_patternb".

    I hope this solves your issue,

    Artur