Search code examples
apache-stormtrident

Storm vs. Trident: When not to use Trident?


I'm working with Storm and it is fine for a lot of use cases. Recently I had a look at Trident, which is a high-level abstraction of Storm. It supports exactly-once processing and makes stateful processing easier.

But now I'm wondering.. Why can't I always use Trident instead of Storm?

What I read so far:

  • Trident processes messages in batches, so throughput time could be longer.
  • Trident is not yet able to process loops in topologies.

Are there any other disadvantages when using Trident instead of Storm? Because right now, I think the disadvantages I listed above are marginal.

What use cases cannot be implemented with Trident?


Aftermath:

Since I asked the question my company decided to go for Trident first. We will only use pure Storm when there are performance problems. Sadly this wasn't an active decision it just became the default behavior (I wasn't around at that time).

Their assumption was that in most use cases we need state or only-once-processing or we will need it in near future. I understand their reasoning because moving from Storm to Trident or back isn't an easy transformation, but in my personal opinion the concept of stream processing without state wasn't understood by all and that was the main reason to use Trident.


Solution

  • To answer your question: when shouldn't you use Trident? Whenever you can afford not to.

    Trident adds complexity to a Storm topology, lowers performance and generates state. Ask yourself the question: do you need the "exactly once" processing semantics of Trident or can you live with the "at least once" processing semantics of Storm. For exactly once, use Trident, otherwise don't.

    I would also just like to highlight the fact that Storm guarantees that all messages will be processed. Some messages might just be processed more than once.