elasticsearch logging logstash logstash-grok

logstash multiline codec with java stack trace

I am trying to parse a log file with grok. the configuration I use allows me to parse a single lined event but not if multilined (with java stack trace).

#what i get on KIBANA for a single line:
{
  "_index": "logstash-2015.02.05",
  "_type": "logs",
  "_id": "mluzA57TnCpH-XBRbeg",
  "_score": null,
  "_source": {
    "message": " -  2014-01-14 11:09:35,962 [main] INFO  (api.batch.ThreadPoolWorker)   user.country=US",
    "@version": "1",
    "@timestamp": "2015-02-05T09:38:21.310Z",
    "path": "/root/test2.log",
    "time": "2014-01-14 11:09:35,962",
    "main": "main",
    "loglevel": "INFO",
    "class": "api.batch.ThreadPoolWorker",
    "mydata": "  user.country=US"
  },
  "sort": [
    1423129101310,
    1423129101310
  ]
}

#what i get for a multiline with Stack trace:
  {
  "_index": "logstash-2015.02.05",
  "_type": "logs",
  "_id": "9G6LsSO-aSpsas_jOw",
  "_score": null,
  "_source": {
    "message": "\tat oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:20)",
    "@version": "1",
    "@timestamp": "2015-02-05T09:38:21.380Z",
    "path": "/root/test2.log",
    "tags": [
      "_grokparsefailure"
    ]
  },
  "sort": [
    1423129101380,
    1423129101380
  ]
}

input {
  file {
    path => "/root/test2.log"
    start_position => "beginning"
    codec => multiline {
      pattern => "^ -  %{TIMESTAMP_ISO8601} "
      negate => true
      what => "previous"
    }
  }
}

filter {
 grok {
    match => [ "message", " -%{SPACE}%{SPACE}%{TIMESTAMP_ISO8601:time} \[%{WORD:main}\] %{LOGLEVEL:loglevel}%{SPACE}%{SPACE}\(%{JAVACLASS:class}\) %{GREEDYDATA:mydata} %{JAVASTACKTRACEPART}"]
  }
    date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

output {
  elasticsearch {
    host => "194.3.227.23"
  }
 # stdout { codec => rubydebug}
}

Can anyone please tell me what i'm doing wrong on my configuration file? Thanks. here's a sample of my log file: - 2014-01-14 11:09:36,447 [main] INFO (support.context.ContextFactory) Creating default context - 2014-01-14 11:09:38,623 [main] ERROR (support.context.ContextFactory) Error getting connection to database jdbc:oracle:thin:@HAL9000:1521:DEVPRINT, with user cisuser and driver oracle.jdbc.driver.OracleDriver java.sql.SQLException: ORA-28001: the password has expired at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70) at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:131) **

*> EDIT: here's the latest configuration i'm using

https://gist.github.com/anonymous/9afe80ad604f9a3d3c00#file-output-L1*

Solution

First point, when repeating testing with the file input, be sure to use sincedb_path => "/dev/null" to be sure to read from the beginning of the file.

About multiline, there must be something wrong either with your question content or your multiline pattern because none of the event have the multiline tag that is added by the multiline codec or filter when aggregating the lines. Your message field should contains all lines separated by line feed characters \n (\r\n in my case being on windows). Here is the expected output from your input configuration

{
"@timestamp" => "2015-02-10T11:03:33.298Z",
   "message" => " -  2014-01-14 11:09:35,962 [main] INFO  (api.batch.ThreadPoolWorker)   user.country=US\r\n\tat oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:20\r",
  "@version" => "1",
      "tags" => [
    [0] "multiline"
],
      "host" => "localhost",
      "path" => "/root/test.file"
}

About grok, as you want to match a multiline string you should use a pattern like this.

filter {
  grok {
    match => {"message" => [
      "(?m)^ -%{SPACE}%{TIMESTAMP_ISO8601:time} \[%{WORD:main}\] %   {LOGLEVEL:loglevel}%{SPACE}\(%{JAVACLASS:class}\) %{DATA:mydata}\n%{GREEDYDATA:stack}",
      "^ -%{SPACE}%{TIMESTAMP_ISO8601:time} \[%{WORD:main}\] %{LOGLEVEL:loglevel}%{SPACE}\(%{JAVACLASS:class}\) %{GREEDYDATA:mydata}"]
}

} }

(?m) prefix instruct the regex engine to do multiline matching. And then you get an event like

{
"@timestamp" => "2015-02-10T10:47:20.078Z",
   "message" => " -  2014-01-14 11:09:35,962 [main] INFO  (api.batch.ThreadPoolWorker)   user.country=US\r\n\tat oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:20\r",
  "@version" => "1",
      "tags" => [
    [0] "multiline"
],
      "host" => "localhost",
      "path" => "/root/test.file",
      "time" => "2014-01-14 11:09:35,962",
      "main" => "main",
  "loglevel" => "INFO",
     "class" => "api.batch.ThreadPoolWorker",
    "mydata" => "  user.country=US\r",
     "stack" => "\tat oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:20\r"
}

You can build and validate your multiline patterns with this online tool http://grokconstructor.appspot.com/do/match

A final warning, there is currently a bug in Logstash file input with multiline codec that mixup content from several files if you use a list or wildcard in path setting. The only workaroud is to use the multiline filter

HTH

EDIT: I was focusing on the multiline strings, you need to add a similar pattern for non single lines string