Search code examples
nginxkibanametricbeat

How to monitor nginx response time with ELK stack?


I'd like to create a monitor that will show near realtime average response time of nginx.

Below image shows CPU usage for example, I'd like to create something similar for avg response time

enter image description here

I know how I can track the response time for individual requests (https://lincolnloop.com/blog/tracking-application-response-time-nginx/)

Although I 'll have to think how to ignore non-page / api requests such as static image request.

This must be pretty basic requirements, but couldn't find google how to do it.


Solution

  • This is actually trickier than you'd expect:

    Metricbeat

    The nginx module of Metricbeat doesn't contain this information. It's built around stubstatus and is more around the process itself rather than the timing of individual requests.

    Filebeat

    The nginx module for Filebeat is where you might expect this. It's built around the nginx access log and has the individual requests. Unfortunately the response time isn't part of the access log by default (at least on Ubuntu) — only the number of bytes sent. Here's an example (response code 200, 158 bytes sent):

    34.24.14.22 - - [10/Nov/2019:06:54:51 +0000] "GET / HTTP/1.1" 200 159 "-" "Go-http-client/1.1"
    

    Packetbeat

    This one has a field called event.duration that sounds promising. But be careful with the HTTP module — this one is really only for HTTP traffic and not HTTPS (because you can't see the encrypted traffic). In most cases you'll want to use HTTPS for your application, so this isn't all that helpful and will mostly show redirects to HTTPS.

    The other protocols such as TLS (this is only the time for the initial handshake) or Flow information (this is a group of packets) are not what you are after IMO.

    Customization

    I'm afraid you'll need some customization and you basically have two options:

    1. Customize the log format of nginx as described in the blog post you linked to. You'll also need to change the pattern in the Elasticsearch ingest pipeline to extract the timing information correctly.
    2. I assume you have an application behind nginx. Then you might want to get even more insights into that than just timing by using (APM / tracing](https://www.elastic.co/products/apm) with the agents for various languages. This way you'll also automatically skip static resources like images and focus on the relevant parts of your application.