Search code examples
elasticsearchnginxlogstashfilebeat

Elasticsearch index does not contain all of the nginx access logs


I am using ELK stack to save the nginx access logs to elasticsearch. Specifically, I am using filebeat to collect them and logstash to parse them. I am using the following configuratons:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/nginx/access.log
    - /var/log/spring/geo/*.log

output.logstash:
  enabled: true
  hosts: ["logstash:5035"]
input {
    beats {
        port => 5035
    }
}

filter {
    grok {
        match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:http_x_forwarded_for}"]
    }
    grok {
        match => [ "http_x_forwarded_for" , "%{IP:real_client_ip}"]
    }
    mutate {
        convert => ["response", "integer"]
        convert => ["bytes", "integer"]
        convert => ["responsetime", "float"]
    }
    geoip {
        source => "clientip"
        target => "geoip"
        add_tag => [ "nginx-geoip" ]
    }
    date {
        match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
    }
    useragent {
        source => "message"
    }
}

output {
    elasticsearch {
        hosts => "elasticsearch:9200"
        index => "weblogs-%{+YYYY.MM.dd}"
        document_type => "nginx_logs"
        user => "elastic"
        password => "changeme"
    }
    stdout { codec => rubydebug }
}

However, I have noticed that for some reason not all logs are passed to the elasticsearch. For example, let's say that I have the following logs:

172.20.0.1 - - [17/Oct/2022:08:25:22 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "111.111.111.111"
112.111.0.1  - - [17/Oct/2022:12:43:22 +0000] "GET /favicon.ico HTTP/1.1" 404 150 "http://localhost/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "-"
111.111.0.1 - - [17/Oct/2022:12:44:44 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "111.111.111.111"
172.19.0.1 - - [17/Oct/2022:12:45:29 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "78.87.79.206, 188.114.103.233"
172.18.0.1 - - [17/Oct/2022:12:46:29 +0000] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "78.87.79.206, 188.114.103.233"

The index is created, but the log 112.111.0.1 - - [17/Oct/2022:12:43:22 +0000] "GET /favicon.ico HTTP/1.1" 404 150 "http://localhost/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "-" is not appearing if I query the index through dev tools. Any idea of what is causing the error?

EDIT: The query that I am using is the following:

GET weblogs-2022.10.17/_search
{
    "size" : 100,
    "query": {
        "match_all" : {}
            },
    "sort" : [{"@timestamp":{"order": "desc"}}]
}

And the result includes on 4 logs instead of 5 and a part of what I am getting is the following (I cannot include all the return, since it is very big):

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    }

Solution

  • It is not indexing because current grok pattern not matching with below log:

    112.111.0.1  - - [17/Oct/2022:12:43:22 +0000] "GET /favicon.ico HTTP/1.1" 404 150 "http://localhost/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36" "-"
    

    Why it is not matching ?

    Because it is contaning extrace space after IP address in starting. All other logs have 1 space and above log have 2 space.

    You can updated your first grok filter in logstash with below configuration and it will index that log as well.

    grok {
            match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:http_x_forwarded_for}", "%{IPORHOST:clientip}%{SPACE}%{HTTPDUSER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{GREEDYDATA:http_x_forwarded_for}"]
        }