Search code examples
javagrailsstatsd

Which StatsD client should I use for a java/grails project?


I'm looking at adding StatsD data collection to my grails application and looking around at existing libraries and code has left me a little confused as to what would be a good scalable solution. To put the question into context a little I'm working on an online gaming type project where I will naturally be monitoring user interactions with the game engine, these will naturally cluster around particular moments in time where X users will be performing interactions within the window of a second or two, then repeating after a 10-20 second pause.

Here is my analysis of the options that are available today.

Etsy StatsD client example

https://github.com/etsy/statsd/blob/master/examples/StatsdClient.java

The "simplest thing that could possibly work" solution, I could pull this class into my project and instanciate a singleton instance as a spring bean and use it directly. However after noticing that the grails-statsd plugin creates a pool of client instances I started wondering about the scalability of this approach.

It seems that the doSend method could become a bottleneck if many threads are trying to send events at the same time, however as I understand it, due to the fire and forget nature of sending UDP packets, this should happen quickly, avoiding the huge overhead that we usually associate with network connections.

grails-statsd plugin

https://github.com/charliek/grails-statsd/

Someone has already created a StatsD plugin for grails that includes some nice features, such as the annotations and withTimer method. However I see that the implementation there is missing some bug fixes from the example implementation such as specifying the locale on calls to String.format. I'm also not a huge fan of pulling in apache commons-pool just for this, when a standard Executor could achieve a similar effect.

java-statsd-client

https://github.com/tim-group/java-statsd-client/

This is an alternative pure java library that operates asynchronously by maintaining its own ExecutorService. It supports the entire StatsD API, including sets and sampling, but doesn't provide any hooks for configuring the thread pool and queue size. In the case of problems, for non-critical things such as monitoring, I think I would prefer a finite queue and losing events than having an infinite queue that fills up my heap.

Play statsd plugin

https://github.com/vznet/play-statsd/

Now I can't use this code directly in my grails project but I thought it was worth a look to see how things were implemented. Generally I love the way the code in StatsdClient.scala is built up, very clean and readable. Also appears to have the locale bug, but otherwise feature complete with the etsy sample. Interestingly, unless there is some scala magic that I've not understood, this appears to create a new socket for each data point that is sent to StatsD. While this approach nicely avoids the necessity for an object pool or executor thread I can't imagine it's terribly efficient, potentially performing DNS lookups within the request thread that should be returning to the user as soon as possible.

The questions

  1. Judging by the fact that all the other implementations appear to have implemented another strategy for handling concurrency, can I assume that the Etsy example is a little too naïve for production use?
  2. Does my analysis here appear to be correct?
  3. What are other people using for statsd in java/groovy?

So far it looks like the best existing solution is the grails plugin as long as I can accept the commons-pool dependency, but right now I'm seriously considering spending Sunday writing my own version that combines the best parts of each implementation.


Solution

  • After sleeping on this for a week I think I'm going to go ahead and use the existing grails StatsD plugin. The rationale for this being that although I could achieve a similar effect using an Executor for handling concurrency, without using an object pool this would still be bound to a single client/socket instance, in theory a rather obvious bottleneck in the application. Therefore if I need a pool anyway, I may as well use one where someone else has done all the hard work :)