Search code examples
javaapache-sparkcascading

Apache Spark or Cascading framework?


I am confused as to when to use the Cascading framework and when to use Apache Spark. What are suitable use cases for each one?

Any help is appreciated.


Solution

  • At heart, Cascading is a higher-level API on top of execution engines like MapReduce. It is analogous to Apache Crunch in this sense. Cascading has a few other related projects, like a Scala version (Scalding), and PMML scoring (Pattern).

    Apache Spark is similar in the sense that it exposes a high-level API for data pipelines, and one that is available in Java and Scala.

    It's more of an execution engine itself, than a layer on top of one. It has a number of associated projects, like MLlib, Streaming, GraphX, for ML, stream processing, graph computations.

    Overall I find Spark a lot more interesting today, but they're not exactly for the same thing.