Search code examples
apache-sparkakkanettyrpc

Which is more scalable with Spark 1.6 (RPC): Netty or AKKA?


Spark 1.6 can be configured to use AKKA or Netty for RPC. In case Netty is configured, does that mean that Spark runtime does not employ the actor model for messaging (e.g. between workers and driver blockmanagers) or even in case of netty configuration, a custom simplified actor model is used by relying on Netty.

I think AKKA itself relies on netty and Spark uses only a subset of AKKA. Still, is configuring AKKA is better for scalability (in terms of number of workers) as compared to netty? any suggestion on this particular spark configuration?


Solution

  • adding to @user6910411s pointer which was nicely explained about the design decision.

    as explained by link Flexibility and removing dependency on Akka was the design decision..

    Question :

    I think AKKA itself relies on netty and Spark uses only a subset of AKKA. Still, is configuring AKKA is better for scalability (in terms of number of workers) as compared to netty? any suggestion on this particular spark configuration?

    Yes Spark 1.6 can be configured to use AKKA or Netty for RPC.

    it can be configured through spark.rpc i.e val rpcEnvName = conf.get("spark.rpc", "netty") that means default: netty.

    Please see 1.6 code base

    Here are more insights , as when to go for what...


    Akka and Netty both deals asynchronous processing and message handling, but they work at different levels W.R.T scalablity.

    Akka is a higher level framework for building event-driven, scalable, fault-tolerant applications. It focuses on the Actor class for message processing. Actors have a hierarchical arrangement, parent actors are responsible for supervision of their child actors.

    Netty also works around messages, but it is a little lower level and deals more with networking. It has NIO at its core. Netty has a lot of features for using various protocols like HTTP, FTP, SSL, etc. Also, you have more fine-grained control over the threading model.

    Netty is actually used within Akka w.r.t. distributed actors.

    So even though they are both asynchronous & message-oriented, with Akka you are thinking more abstractly in your problem domain, and with Netty you are more focused on the networking implementation.

    Conclusion : Netty and Akka both are equally scalable. pls also note that Spark2 onwards default is Netty I cant see Akka as spark.rpc flag there I mean val rpcEnvName = conf.get("spark.rpc", "netty") is not available. in Spark2.0 code see RpcEnv.scala.