elasticsearch logstash kibana logstash-grok

Analyzing delays from log files using logstash and kibana

I am new to the ELK stack and was hoping to use that to debug issues from log files. My requirement is to plot the time taken for a thread to process a task and return back to the pool. Basically the logs look like the following :

2014-10-31 13:00:00,007 [catalina-exec-11] DEBUG [com.clairmail.jms.concurrent.poolimpl.JMSProducerPoolImpl] Providing a pooled producer to catalina-exec-11

..... and

2014-10-31 13:00:00,121 [catalina-exec-11] DEBUG [com.clairmail.jms.concurrent.poolimpl.JMSProducerPoolImpl] catalina-exec-11 has returned producer to pool.

From the above logs, I can figure out that this particular thread catalina-exec-11 took about 107 ms to complete. My questions are the following:

How can this be represented on a graph in Kibana?
We have a distributed architecture, so in all likelihood there is going to be a catalina-exec-11 in the second instance and how can that be distinguished from the first instance?
Aggregate over both the servers to see how the average time to process a thread has been.
What would my grok filter look like?

Anoop

Solution

Assuming you can change the format of the logs, these tips might be useful for you:

There is no way (as far as i know) to compute latency of an operation from two different elastic search documents in kibana 3.1 (current stable version). Maybe in Kibana 4 it'll be possible. What you are trying to do would be trivial if your logs entries contained the operation elapsed time. For example:

2014-10-31 13:00:00,007 0 [catalina-exec-11] DEBUG [com.clairmail.jms.concurrent.poolimpl.JMSProducerPoolImpl] Providing a pooled producer to catalina-exec-11

2014-10-31 13:00:00,121 114 [catalina-exec-11] DEBUG [com.clairmail.jms.concurrent.poolimpl.JMSProducerPoolImpl] catalina-exec-11 has returned producer to pool.

Notice that i added a new log field after the date. Then you could plot a kibana histogram to show the mean of that latency field.

So you have a distributed architecture with different machines, and possibly likely named threads. What you need is a globally unique request context id. There are many ways to achieve this: Guid, or some Int64 numerical unique id, similar to twitter snowflakes. Using such an id your logs would look like this:

5892EAC3-EE95-4549-A787-CCE5A6639598 2014-10-31 13:00:00,007 0 [catalina-exec-11] DEBUG [com.clairmail.jms.concurrent.poolimpl.JMSProducerPoolImpl] Providing a pooled producer to catalina-exec-11

5892EAC3-EE95-4549-A787-CCE5A6639598 2014-10-31 13:00:00,121 114 [catalina-exec-11] DEBUG [com.clairmail.jms.concurrent.poolimpl.JMSProducerPoolImpl] catalina-exec-11 has returned producer to pool.

Notice that i added a new guid field in the beginning of each log entry.

That's easily can be achieved by logstash. If logstash is configured to read log files from different machines, the file input will add a new 'host' field to the logstash event that you can send to elasticsearch. Then in kibana dashboard, you can easily show how requests are distributed among different machines.
your grok filter can look something like this: grok { match => [ "message", "%{UUID:requestid}%{SPACE}%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{NUMBER:ellapsedmilliseconds}%{SPACE}[{WORD:threadname}]%{SPACE}%{WORD:level}%{SPACE}[%{WORD:logger}]%{SPACE}%{WORD:logmessage}"] }

This is just an example and you will have to fine tune that for your needs.

Hope that helps.