Search code examples
elasticsearchlogstashlogstash-grok

Logstash grok failing


Am trying to grok a message but its failing with _grokparsefailure in log but doesn't actually say what it's failing on. The grok query works on https://grokdebug.herokuapp.com/

input {
  file {
  type => "apache-access"
  path => "C:/prdLogs/sent/*"
}
   filter {
   grok {
  match => ['message', '%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp}   \] "%{WORD:httpmethod} %{NOTSPACE:referrer} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} "-" "%{NOTSPACE:request}" %{QS:UserAgent} %{WORD:httpmethodO} - - HTTP/%{NUMBER:httpversion2} "%{WORD:session}:%{WORD:httpmed}" "-" %{NUMBER:duration}' ]
}
   date {
    match => [ "raw_timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
    target => '@timestamp'
   }
  }

   output {
elasticsearch { hosts => ["111.44.44.44:9200"] }
  }

The data looks like:

199.77.22.22 - - [26/Feb/2017:10:18:45 +0800] "GET /myapp/app/i18n/key/parent.selector.label.select.item/?locale=en_GB&dojo.preventCache=1488075524942 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3203
199.77.22.22 - - [26/Feb/2017:10:18:45 +0800] "GET /myapp/app/i18n/key/parent.selector.label.no.recently.used/?locale=en_GB&dojo.preventCache=1488075525483 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3159
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/selector.label.selected/?locale=en_GB&dojo.preventCache=1488075525843 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3600
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/actor.selector.label.remove.all/?locale=en_GB&dojo.preventCache=1488075526305 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3224
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/com.label.filter.objects/?locale=en_GB&dojo.preventCache=1488075526711 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3299

This is actually an apache access log but I was unable to use COMBINEDAPACHELOG or COMMONAPACHELOG. Same error actually!!

All entries in elasticsearch are tagged as "_grokparsefailure". I ran logstash in debug mode with log.level at debug but am not seeing any errors in the log.

Am using the latest version of logstash.

Please advise.

R2 D2 Thanks, I tried this but no joy :(

I created a patterns file and pasted your pattern. I just changed the payload to just "130.39.22.22 - - [23/Feb/2015:10:18:45 +0800]" and the following was my filter:

filter {

grok {
      patterns_dir => ["c:/logstashconfig/patterns"]
      match => ['message', '%{IP:clientip} - - /[%{DATE_CUSTOM:timestamp}/]'] 
    }
date {
    match => [ "timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
    target => '@timestamp'
  }
}

The debug log in logstash:

{
      "path" => "C:/prdLogs/sent/test",
"@timestamp" => 2017-03-03T00:06:15.269Z,
      "@version" => "1",
      "host" => "hkw20012125",
   "message" => "130.39.22.22 - -     [23/Feb/2015:10:18:45 +0800]\r",
      "type" => "apache-access",
      "tags" => [
    [0]     "_grokparsefailure"
]   
}

Any ideas? Is it the +0800 at the end of the data? Thanks.


Solution

  • I think once you have GREEDYDATA in your pattern, it means to consider rest of your line from the log:

    GREEDYDATA's pattern looks like:

    GREEDYDATA .* <-- means to capture the entire line
    

    And your grok match should look something like this if I'm not mistaken:

    grok {
      match => ['message', '%{IPV4:clientip} - - %{GREEDYDATA:data}']
    }
    

    unless you need the values to be extracted separately, the above grok should do the trick for you. And I think the way you're matching the timestamp is wrong. In order to handle your timestamp you need to have the below patterns within your patterns file:

    MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
    MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
    YEAR (?>\d\d){1,2}
    TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
    DATE_CUSTOM %{MONTHDAY}[/]%{MONTH }[/]%{YEAR}:%{TIME}
    

    And then you could simply use this within your grok match:

    grok {
        match => ['message', '%{IPV4:clientip} - - \[%{DATE_CUSTOM:timestamp} %{GREEDYDATA:data}']
    }
    

    Now you'll be able to match the timestamp as:

    date {
        match => [ "timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
        target => '@timestamp'
    }
    

    Hope this helps!