Search code examples
apache-flinkflink-statefun

Apache Flink Stateful Functions python vs java performance


What are the advantages and disadvantages of using python or java when developing apache flink stateful function.

  • Is there any performance difference? which one is more efficient for the same operation?
  • Can we develop the application completely on python?
  • What are the features that one supports and the other does not.

Solution

  • StateFun support embedded functions and remote functions.

    • Embedded functions are bundled and deployed within the JVM processes that run Flink. Therefore they must be implemented in a JVM language (like Java) and they would be the most performant. The downside is that any change to the function code requires a restart of the Flink cluster.

    • Remote functions are functions that are executing in a separate process, and are invoked by the Flink cluster for every incoming message addressed to them. Therefore they are expected to be less performant than the embedded functions, but they provide a great flexibility in:

      • Choosing an implementation language
      • Fast scaling up and down
      • Fast restart in case of a failure.
      • Rolling upgrades

    Can we develop the application completely on python?

    Is it is possible to develop an application completely in Python, see the python greeter example.

    What are the features that one supports and the other does not.

    The current features are currently supported only in the Java SDK:

    • Richer routing logic from an ingress to a function. Any routing logic that you can describe via code.
    • Few more state types like a table and a buffer.
    • Exposing existing Flink sources and Sinks as ingresses and egresses.