apache-storm low-latency stream-processing

Reaching limits of Apache Storm

We are trying to implement a web application with Apache Storm.

Application
receives a huge load of ad-requests (100 TPS - a hundred transactions / second ),
makes some simple calculation on them and then
stores the result in a NoSQL database
with a maximum latency of 10 ms.

We are using Cassandra as a sink for its writing capabilities.

However, we have already overpassed the 8 ms requirement, we are in 100ms.

We tried to minimize the size of buffers (Disruptor buffers) and to well balance the topology, using the parallelism of bolts.

But we still in 20ms.

With 4 worker ( 8 cores / 16GB ) we are at 20k TPS which is still very low.

Is there any suggestions for optimization or
are we just reaching the limits of Apache Storm
(limits of Java)?

Solution

I don't know the platform you're using, but in C++ 10ms is eternity. I would think you are using the wrong tools for the job.

Using C++, serving some local query should take under a microsecond.

Non-local queries that touch multiple memory locations and/or have to wait for disk or network I/O, have no choice but taking more time. In this case parallelism is your best friend.

You have to find the bottleneck.

Is it I/O?
Is it CPU?
Is it memory bandwidth?
Is it memory access time?

After you've found the bottleneck, you can either improve it, async it and/or multiply (=parallelize) it.