Search code examples
apachelogginglogstashgrok

Logstash Multiple Log Formats


So, we're looking at some kind of log aggregator as having logs all over the place is not scaling. I've been looking at Logstash, and was able to get an instance with kibana up and running last night, but there were some problems. For instance, with the geoip using our domain name for httpd (I assumed they were apache) logs.

Anyway, now I'd like to open it up to our other web server logs, and I'm having trouble understanding something: is it necessary that I define patterns for all of the different formats of logs we are using? How is this typically approached: a big logstash.conf file, or some other way?

PS: I realize that some of these logs share similarities, for instance the error_log files all share almost identical formats, as do the access_logs. So I assume something like this would handle all the *error_log files.

input { 
    file {
        path => "//var/log/httpd/*error_log"
        type => "error_log"
    }
}

filter {
    if [type] == "error_log" {
        grok {
            match => [ "message", "%{COMBINEDAPACHELOG}" ]
        }
    }
}

Anyway, here's a sample line from each of the logs I want to import.

var/log/httpd/access_log:
207.46.13.87 support.mycompany.com - - [15/Mar/2015:07:49:12 -0400] "GET / HTTP/1.1" 302 - "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

var/log/httpd/api-access_log:
192.168.1.5 api.mycompany.com - - [15/Mar/2015:06:50:01 -0400] "GET /diag/heartbeat HTTP/1.0" 502 495 "-" "Wget/1.11.4 Red Hat modified"

var/log/httpd/api-error_log:
[Sun Mar 15 08:45:06 2015] [error] [client 192.168.1.5] proxy: Error reading from remote server returned by /diag/heartbeat

var/log/httpd/audit_log:
type=USER_END msg=audit(1426380301.674:2285509): user pid=30700 uid=0 auid=0 msg='PAM: session close acct="root" : exe="/usr/sbin/crond" (hostname=?, addr=?, terminal=cron res=success)'

var/log/httpd/default-access_log:
74.77.76.4 dc01.mycompany.com - - [15/Mar/2015:09:33:46 -0400] "GET /prod/shared/700943_image003.jpg HTTP/1.1" 200 751 "http://mail.twc.com/do/mail/message/view?msgId=INBOXDELIM18496" "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"

var/log/httpd/error_log:
[Sun Mar 15 13:54:16 2015] [error] [client 107.72.162.115] File does not exist: /var/www/html/portal-prod/apple-touch-icon.png

var/log/httpd/portal-prod-access_log:
192.168.1.5 portal.mycompany.com - - [15/Mar/2015:04:15:02 -0400] "GET /index.php/account/process_upload_file?upload_file=T702135.0315.txt HTTP/1.0" 200 9 "-" "Wget/1.11.4 Red Hat modified"

var/log/httpd/ssl_access_log:
97.77.91.2 - - [15/Mar/2015:10:00:07 -0400] "POST /prod/index.php/api/uploader HTTP/1.1" 200 10

var/log/httpd/ssl_error_log:
[Sun Mar 15 09:00:03 2015] [error] [client 99.187.226.241] client denied by server configuration: /var/www/html/api

var/log/httpd/ssl_request_log:
[15/Mar/2015:11:10:02 -0400] dc01.mycompany.com 216.240.171.98 TLSv1 RC4-MD5 "POST /prod/index.php/api/uploader HTTP/1.1" 7

var/log/httpd/support-access_log:
209.255.201.30 support.mycompany.com - - [15/Mar/2015:04:07:51 -0400] "GET /cron/index.php?/Parser/ParserMinute/POP3IMAP HTTP/1.0" 200 360 "-" "Wget/1.11.4 Red Hat modified"

var/log/httpd/support-error_log:
[Sun Mar 15 04:05:43 2015] [warn] RSA server certificate CommonName (CN) `portal.mycompany.com' does NOT match server name!?

var/log/httpd/web-prod-access_log
62.210.141.227 www.mycompany.com - - [15/Mar/2015:04:38:30 -0400] "HEAD /lib/uploadify/uploadify.swf HTTP/1.1" 404 - "http://www.mycompany.com/lib/uploadify/uploadify.swf" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

var/log/httpd/web-prod-error_log:
[Sun Mar 15 04:38:30 2015] [error] [client 62.210.141.227] File does not exist: /var/www/html/web-prod/lib, referer: http://www.mycompany.com/lib/uploadify/uploadify.swf

var/log/cron:
Mar 15 04:30:01 lilo crond[22758]: (root) CMD (/opt/mycompnay/bin/check_replication.sh)

var/log/mysqld.log:
150314  5:07:34 [ERROR] Slave SQL: Error 'Deadlock found when trying to get lock; try restarting transaction' on query. Default database: 'my_database'. Query: 'insert into some_table (column_names) values (values)', Error_code: 1213

var/log/openvpn.log:
Sun Mar 15 13:19:31 2015 Re-using SSL/TLS context
Sun Mar 15 12:23:40 2015 don/50.182.238.21:43315 Control Channel: TLSv1, cipher TLSv1/SSLv3 DHE-RSA-AES256-SHA, 1024 bit RSA

var/log/maillog:
Mar 15 05:26:45 lilo postfix/qmgr[4428]: 70460B04004: removed
Mar 15 07:06:40 lilo postfix/smtpd[31732]: connect from boots[192.168.1.4]

codeigniter_logs:
DEBUG - 2015-03-15 14:48:29 --> Session class already loaded. Second attempt ignored.
DEBUG - 2015-03-15 14:48:29 --> Helper loaded: url_helper

Solution

  • So one part that was annoying me was the geoip filter when using the COMBINEDAPACHELOG pattern to parse the line:

    192.168.1.5 portal.mycompany.com - - [15/Mar/2015:04:15:02 -0400] "GET /index.php/account/process_upload_file?upload_file=T702135.0315.txt HTTP/1.0" 200 9 "-" "Wget/1.11.4 Red Hat modified"

    It would get the ip of portal.mycompany.com and use that to determine the location. Using the pattern "%{IP:clientip} %{COMBINEDAPACHELOG}" took care of this.

    Here is my filter section then:

     if [type] == "apache" {
            if [path] =~ "access" and [path] !~ "ssl_access" {
                    mutate { replace => { type => "apache_access" } }
                    grok {  match => { "message" => "%{IP:clientip} %{COMBINEDAPACHELOG}" } }
                    #grok { match => { "message" => "%{COMBINEDAPACHELOG}" } }
                    date {
                            locale => en
                            match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
                    }
            } else if [path] =~ "ssl_access" {
                    mutate { replace => { type => "apache_access" } }
                    grok { match => { "message" => "%{COMBINEDAPACHELOG}" } }
                    date {
                            locale => en
                            match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
                    }
    
            } else if [path] =~ "error" {
                    mutate { replace => { type => "apache_error" } }
            }
    }
    
    if [agent] != "" {
            useragent { source => "agent" }
    }
    
    geoip { source => "clientip" }
    

    Being very specific on the input section also helped a great deal. I still have to set up a redis instance to ship logs from our other dc to this box, but so far it's performing spectacularly.

    I'd like to have a prepackaged ELK stack that contains Kibana 4 though. That UI is much cleaner than Kibana 3.