grokdebugger validates entries of a log that logstash eventually refuses

Using the grokdebugger I've adapted what I found over the Internet for my first attempt to handle logback spring-boot kind of logs.

Here is a log entry sent to grokdebugger:

2022-03-09 06:35:15,821 [http-nio-9090-exec-1] WARN  org.springdoc.core.OpenAPIService - found more than one OpenAPIDefinition class. springdoc-openapi will be using the first one found.

with the grok pattern:
(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) \[(?<thread>(.*?)+)\] %{LOGLEVEL:level}\s+%{GREEDYDATA:class} - (?<logmessage>.*)

and its dispatches its content as wished:

{
  "timestamp": [
    [
      "2022-03-09 06:35:15,821"
    ]
  ],
  "YEAR": [
    [
      "2022"
    ]
  ],
  "MONTHNUM": [
    [
      "03"
    ]
  ],
  "MONTHDAY": [
    [
      "09"
    ]
  ],
  "TIME": [
    [
      "06:35:15,821"
    ]
  ],
  "HOUR": [
    [
      "06"
    ]
  ],
  "MINUTE": [
    [
      "35"
    ]
  ],
  "SECOND": [
    [
      "15,821"
    ]
  ],
  "thread": [
    [
      "http-nio-9090-exec-1"
    ]
  ],
  "level": [
    [
      "WARN"
    ]
  ],
  "class": [
    [
      "org.springdoc.core.OpenAPIService"
    ]
  ],
  "logmessage": [
    [
      "found more than one OpenAPIDefinition class. springdoc-openapi will be using the first one found."
    ]
  ]
}

But when I ask for the same action inside logstash, I set in configuration for input declaration:

input {
    file {
        path => "/home/lebihan/dev/Java/comptes-france/metier-et-gestion/dev/ApplicationMetierEtGestion/sparkMetier.log"

        codec => multiline {
           pattern => "^%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}.*"
           negate => "true"
           what => "previous"
        }
    }
}

and for filter declaration:

filter {
  #If log line contains tab character followed by 'at' then we will tag that entry as stacktrace
  if [message] =~ "\tat" {
    grok {
      match => ["message", "^(\tat)"]
      add_tag => ["stacktrace"]
    }
  }
 
 grok {
    match => [ "message",
               "(?<timestamp>%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}) \[(?<thread>(.*?)+)\] %{LOGLEVEL:level}\s+%{GREEDYDATA:class} - (?<logmessage>.*)"
             ]
  }
  
  date {
    match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss.SSS" ]
  }
}

But it fails in parsing it, and I don't know how to have extra content about the underlying error mentioned by _grokparsefailure.

Solution

The main responsible of my trouble is the:

grok {
      match => [

instead of:

grok {
      match => {

But after that, I had to change:

the timestamp definition to a %{TIMESTAMP_ISO8601:timestamp}
the date match
and in the date match add a target to it to avoid a

to avoid a _dateparsefailure.

@timestamp:
    Mar 16, 2022 @ 09:14:22.002
@version:
    1
class:
    f.e.service.AbstractSparkDataset
host:
    debian
level:
    INFO
logmessage:
    Un dataset a été sauvegardé dans le fichier parquet /data/tmp/balanceComptesCommunes_2019_2019.
thread:
    http-nio-9090-exec-10
timestamp:
    2022-03-16T06:34:09.394Z
_id:
    8R_KkX8BBIYNTaMw1Jfg
_index:
    ecoemploimetier-2022.03.16
_score:
    - 
_type:
    _doc

I eventually corrected my logstash config file like that:

input {
    file {
        path => "/home/[...]/myLog.log"

        sincedb_path => "/dev/null"
        start_position => "beginning"

        codec => multiline {
           pattern => "^%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}.*"
           negate => "true"
           what => "previous"
        }
    }
}

filter {
   #If log line contains tab character followed by 'at' then we will tag that entry as stacktrace
   if [message] =~ "\tat" {
      grok {
         match => ["message", "^(\tat)"]
         add_tag => ["stacktrace"]
      }
   }
 
   grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[(?<thread>(.*?)+)\] %{LOGLEVEL:level} %{GREEDYDATA:class} - (?<logmessage>.*)" }
   }
 
   date {
      # 2022-03-16 07:32:24,860
      match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss,SSS" ]
      target => "timestamp"
    }

   # S'il n'y a pas d'erreur de parsing, supprimer le message d'origine, non parsé
   if "_grokparsefailure" not in [tags] {
      mutate {
         remove_field => [ "message", "path" ]
      }
   }
}

output {
    stdout { codec => rubydebug }

    elasticsearch {
        hosts => ["localhost:9200"]
        index => "ecoemploimetier-%{+YYYY.MM.dd}"
    }
}