Search code examples
apachelogstashlogstash-groklogstash-forwarderlogstash-configuration

logstash filter definition for an extended apache log


I'm trying to configure a logstash filter for an extented apache log filter definition. It is basically the 'combined' LogFormat with some additional field, here is the apache log format definition:

LogFormat "%h %{X-LB-Client-IP}i %l %u %m %t \"%{Host}i\" \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\" %D" combinextended

Here is a sample log file content:

12.123.456.789 122.123.122.133 - - GET [06/May/2015:18:42:41 +0200] "www.example.com" "GET /fr-fr/test/content/ HTTP/1.1" 200 14023 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.5) Gecko/2008121718 Gentoo Firefox/3.0.5" 7729

I configured the logstash-forward to send the files :

{
  "paths": [
    "/var/log/mysite/extended.log",
    "/var/log/myothersite/extended.log" 
  ],
  "fields": { "type": "apache-extended" }
}

I configured logstash server with a grok pattern in a file in /etc/logstash/conf.d, named 13-apache-extended.conf :

filter {
if [type] == "apache-extended" {
  grok {
    match => { "message" => "%{IPORHOST:proxyip} %{IPORHOST:clientip} %{USER:ident} %{USER:auth} %{WORD:method} \[%{HTTPDATE:timestamp}\] \"%{IPORHOST:host}\" \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) \"%{GREEDYDATA:referer}\" \"%{GREEDYDATA:agent}\" %{NUMBER:responsetime}" }
    }
  }
}

I tested it in https://grokdebug.herokuapp.com/ and it seemed ok:

Log sample :

12.123.456.789 122.123.122.133 - - GET [06/May/2015:18:42:41 +0200] "www.example.com" "GET /fr-fr/test/content/ HTTP/1.1" 200 14023 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.5) Gecko/2008121718 Gentoo Firefox/3.0.5" 7729

Pattern:

%{IPORHOST:proxyip} %{IPORHOST:clientip} %{USER:ident} %{USER:auth} %{WORD:method} \[%{HTTPDATE:timestamp}\] \"%{IPORHOST:host}\" \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) \"%{GREEDYDATA:referer}\" \"%{GREEDYDATA:agent}\" %{NUMBER:responsetime}

But when I restart logstash on my main server, I got an error :

{:timestamp=>"2015-05-06T18:36:28.846000+0200", :message=>"Exception in lumberjack input", :exception=>#<LogStash::ShutdownSignal: LogStash::ShutdownSignal>, :level=>:error}
{:timestamp=>"2015-05-06T18:36:44.342000+0200", :message=>"Error: Expected one of #, {, } at line 35, column 142 (byte 969) after filter {\n  if [type] == \"apache-extended\" {\n    grok {\n      match => { \"message\" => \"%{IPORHOST:proxyip} %{IPORHOST:clientip} %{USER:ident} %{USER:auth} %{WORD:method} \\[%{HTTPDATE:timestamp}\\] \""}
{:timestamp=>"2015-05-06T18:36:44.349000+0200", :message=>"You may be interested in the '--configtest' flag which you can\nuse to validate logstash's configuration before you choose\nto restart a running system."}

Any idea greatly appreciated.

Thanks.


Solution

  • I have tested your problem and I have 2 possible solutions for you.

    1. Are you sure your lumberjack is configured correctly? Did you check it with a basic log file?
    2. Are you sure you have the same pattern in your config as you post? Because I noticed the pattern you post and the error output aren't the same.

      %{IPORHOST:proxyip} %{IPORHOST:clientip} %{USER:ident} %{USER:auth} %{WORD:method} \[%{HTTPDATE:timestamp}\] \"%{IPORHOST:host}\" \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) \"%{GREEDYDATA:referer}\" \"%{GREEDYDATA:agent}\" %{NUMBER:responsetime}

    !=

    %{IPORHOST:proxyip} %{IPORHOST:clientip} %{USER:ident} %{USER:auth} %{WORD:method} \[%{HTTPDATE:timestamp}\] \"}
    

    You aren't allowed to put enters in your pattern for semantics and you need to \ every special character.

    This error ("Error: Expected one of #, {, } at line 35, column 142 (byte 969)) usually means that you have a syntax error at that place like for example when you forgot to escape a special character.

    I tested your config without lumberjack and everything is working correctly.

    Log Sample: 12.123.456.789 122.123.122.133 - - GET [06/May/2015:18:42:41 +0200] "www.example.com" "GET /fr-fr/test/content/ HTTP/1.1" 200 14023 "-" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.5) Gecko/2008121718 Gentoo Firefox/3.0.5" 7729

    Config:

    input {  
        file {
            path => "d:/Git/LogstashELKElision/logstash/bin/log/test.log"
            type => extendedapache
          }}
        filter {
        if [type] == "extendedapache" {
        grok {
          match => [ "message", "%{IPORHOST:proxyip} %{IPORHOST:clientip} %{USER:ident} %{USER:auth} %{WORD:method} \[%{HTTPDATE:timestamp}\] \"%{IPORHOST:host}\" \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) \"%{GREEDYDATA:referer}\" \"%{GREEDYDATA:agent}\" %{NUMBER:responsetime}" ]
        }
          }
        }
        output {
          elasticsearch { hosts => ["localhost:9200"] }
          stdout { codec => rubydebug }
        }input {  
        file {
            path => "d:/Git/LogstashELKElision/logstash/bin/log/test.log"
            type => extendedapache
          }}
        filter {
        if [type] == "extendedapache" {
        grok {
          match => [ "message", "%{IPORHOST:proxyip} %{IPORHOST:clientip} %{USER:ident} %{USER:auth} %{WORD:method} \[%{HTTPDATE:timestamp}\] \"%{IPORHOST:host}\" \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) \"%{GREEDYDATA:referer}\" \"%{GREEDYDATA:agent}\" %{NUMBER:responsetime}" ]
        }
          }
        }
        output {
          elasticsearch { hosts => ["localhost:9200"] }
          stdout { codec => rubydebug }
        }