Search code examples
apache-sparkairflow-schedulerspark-submit

How to submit spark-submit from Apache Airflow


Can anyone help me how to schedule a spark job in the Apache Airflow,

I am looking for the script please help me


Solution

  • Amogh, you need to perform the following steps :

    1. Download and Install Apache Spark on your Airflow servers from here
    2. Configure your freshly installed Spark like the ones on your cluster.
    3. Add the bin directory to your PATH environment variable for convenience.
    4. Create a Dag within Airflow with a BashOperator, which runs either the spark-submit command or a custom shell responsible for that. See here.