Search code examples
eventsclojureapache-stormstream-processinglamina-clojure

Lamina vs Storm


I am designing a prototype realtime monitor for processing fairly large amounts (>30G/day) of streaming numeric data. I would like to write this in Clojure, as the language seems to be well suited to the kind of "Observer + state machine" system that this will probably end up as.

The two main candidates I have found for a framework are Lamina and Storm. There is also Riemann and Pulse, but the former seems to be more of a full solution rather than a framework, and I'd rather not commit to a final design yet; Pulse's repo looks a little unmaintained?

What I would like to know is; what kinds of data- and work flow are these two projects optimised for? Storm seems to be more mature, but Lamina seems more composable and "Clojureic" (my background is Python, so I tend to rate this highly).

What I've found from reading online:

  • Storm seems to be Big Data(stream) focussed, the core is straight Java with a Clojure DSL. It appears to have pre=built handlers for a number of existing data sources.

  • Lamina is more a lightweight, reusable component that does the Clojure thing of coding to abstractions, meaning it can be reused as a base for other eventing systems. The data sources need to be handled in code.

  • Both have a useful set of aggregation/splitting/computation library functions out of the box. Lamina's graphviz integration is a nice touch.


Solution

  • Storm incorporates cluster management and handling of failed nodes in the flow because it was designed to be sort of "like Hadoop but for streaming", which from what I understand of your requirements seems to be closer to your use case.