Search code examples
apache-sparkapache-stormapache-apexlambda-architecture

How to use datatorrent in a kappa architecture?


I read a lot about lambda and kappa architectures where we need to use either Apache Spark or Apache Storm. I just discovered a new tool called DataTorrent which can do batch and real-time process. I was wondering if DataTorrent can do, at the same time, the batch and speed layer of a lambda (or kappa) architecture ?

Cheers,


Solution

  • Apache apex or Datatorrent RTS allows your team to develop, test, debug and operate on a single processing framework.

    Although, there is no explicit mention about kappa architecture in the Apache apex documentation, IMO it can be used to serve kappa architecture.

    Apache apex would provide built-in support for fault tolerance, checkpointing, recovery. Thus, you can rely on single dataflow DAG in Apex to get reliable results with low latencies. There is no need to have separate batch layer and speed layer when you define your application using DAG on Apex.

    But, note that Apache Apex is an example of stream computation engine. For complete Kappa architecture you would have combination of Log stores + stream computation engine + Serving layer store.