Search code examples
elasticsearchlogginglogstash

grokdebugger validates entries of a log that logstash eventually refuses


Using the grokdebugger I've adapted what I found over the Internet for my first attempt to handle logback spring-boot kind of logs.

Here is a log entry sent to grokdebugger:

2022-03-09 06:35:15,821 [http-nio-9090-exec-1] WARN  org.springdoc.core.OpenAPIService - found more than one OpenAPIDefinition class. springdoc-openapi will be using the first one found.

with the grok pattern:
(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) \[(?<thread>(.*?)+)\] %{LOGLEVEL:level}\s+%{GREEDYDATA:class} - (?<logmessage>.*)

and its dispatches its content as wished:

{
  "timestamp": [
    [
      "2022-03-09 06:35:15,821"
    ]
  ],
  "YEAR": [
    [
      "2022"
    ]
  ],
  "MONTHNUM": [
    [
      "03"
    ]
  ],
  "MONTHDAY": [
    [
      "09"
    ]
  ],
  "TIME": [
    [
      "06:35:15,821"
    ]
  ],
  "HOUR": [
    [
      "06"
    ]
  ],
  "MINUTE": [
    [
      "35"
    ]
  ],
  "SECOND": [
    [
      "15,821"
    ]
  ],
  "thread": [
    [
      "http-nio-9090-exec-1"
    ]
  ],
  "level": [
    [
      "WARN"
    ]
  ],
  "class": [
    [
      "org.springdoc.core.OpenAPIService"
    ]
  ],
  "logmessage": [
    [
      "found more than one OpenAPIDefinition class. springdoc-openapi will be using the first one found."
    ]
  ]
}

But when I ask for the same action inside logstash, I set in configuration for input declaration:

input {
    file {
        path => "/home/lebihan/dev/Java/comptes-france/metier-et-gestion/dev/ApplicationMetierEtGestion/sparkMetier.log"

        codec => multiline {
           pattern => "^%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}.*"
           negate => "true"
           what => "previous"
        }
    }
}

and for filter declaration:

filter {
  #If log line contains tab character followed by 'at' then we will tag that entry as stacktrace
  if [message] =~ "\tat" {
    grok {
      match => ["message", "^(\tat)"]
      add_tag => ["stacktrace"]
    }
  }
 
 grok {
    match => [ "message",
               "(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) \[(?<thread>(.*?)+)\] %{LOGLEVEL:level}\s+%{GREEDYDATA:class} - (?<logmessage>.*)"
             ]
  }
  
  date {
    match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss.SSS" ]
  }
}

But it fails in parsing it, and I don't know how to have extra content about the underlying error mentioned by _grokparsefailure.

enter image description here


Solution

  • The main responsible of my trouble is the:

    grok {
          match => [ 
    

    instead of:

    grok {
          match => {
    

    But after that, I had to change:

    • the timestamp definition to a %{TIMESTAMP_ISO8601:timestamp}
    • the date match
    • and in the date match add a target to it to avoid a

    to avoid a _dateparsefailure.

    @timestamp:
        Mar 16, 2022 @ 09:14:22.002
    @version:
        1
    class:
        f.e.service.AbstractSparkDataset
    host:
        debian
    level:
        INFO
    logmessage:
        Un dataset a été sauvegardé dans le fichier parquet /data/tmp/balanceComptesCommunes_2019_2019.
    thread:
        http-nio-9090-exec-10
    timestamp:
        2022-03-16T06:34:09.394Z
    _id:
        8R_KkX8BBIYNTaMw1Jfg
    _index:
        ecoemploimetier-2022.03.16
    _score:
        - 
    _type:
        _doc 
    

    I eventually corrected my logstash config file like that:

    input {
        file {
            path => "/home/[...]/myLog.log"
    
            sincedb_path => "/dev/null"
            start_position => "beginning"
    
            codec => multiline {
               pattern => "^%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}.*"
               negate => "true"
               what => "previous"
            }
        }
    }
    
    filter {
       #If log line contains tab character followed by 'at' then we will tag that entry as stacktrace
       if [message] =~ "\tat" {
          grok {
             match => ["message", "^(\tat)"]
             add_tag => ["stacktrace"]
          }
       }
     
       grok {
          match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[(?<thread>(.*?)+)\] %{LOGLEVEL:level} %{GREEDYDATA:class} - (?<logmessage>.*)" }
       }
     
       date {
          # 2022-03-16 07:32:24,860
          match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss,SSS" ]
          target => "timestamp"
        }
    
       # S'il n'y a pas d'erreur de parsing, supprimer le message d'origine, non parsé
       if "_grokparsefailure" not in [tags] {
          mutate {
             remove_field => [ "message", "path" ]
          }
       }
    }
    
    output {
        stdout { codec => rubydebug }
    
        elasticsearch {
            hosts => ["localhost:9200"]
            index => "ecoemploimetier-%{+YYYY.MM.dd}"
        }
    }