Search code examples
hadoop-yarnlivyairflowapache-spark-2.3

Airflow: Use LivyBatchOperator for submitting pyspark applications in yarn


I have encountered something called LivyBatchOperator but unable to find a very good example for it to submit pyspark applications in airflow. Any info on this would really be appreciated. Thanks in advance.


Solution

  • I come across this blog post which can help you to walk through available options on Airflow + Spark.

    Here is an example of LivyBatchOperator and here is on how to install airflow-livy-operators.

    I would recommend below options :

    1. AWS EMR : Use EmrAddStepsOperator
    2. Regular Spark Cluster : Use above mechanism to set up Livy operators in airflow. This will give you a slick configuration from the airflow servers perspective as well as using Livy in front of spark cluster.

    Let me know your response !