Search code examples
apache-sparkairflow

Airflow and Spark


I hava airflow cluster and spark cluster, when i submit my job

    task_id='flight_search_ingestion',
    conn_id='spark_test_id',
    total_executor_cores=4,
    application='./plugins/plug_code.py',
    executor_cores=2,
    executor_memory='5g',
    driver_memory='5g',
    name='flight_search_ingestion',
    execution_timeout=timedelta(minutes=10),
    dag=dag
)

which gives me error JAVA HOME is not set

in airflow cluster i dont have java, should i have java installed in airflow cluster as well ?


Solution

  • You need java install if you are using spark. what is the instance os? you can check via: cat /etc/os-release if it is a Linux based system. Let's say it is ubuntu. you can check available java versions:

    sudo update-alternatives --config java
    

    result should be like this:

      0            /usr/lib/jvm/java-11-openjdk-amd64/bin/java      1111      auto mode
      1            /usr/lib/jvm/java-11-openjdk-amd64/bin/java      1111      manual mode
    * 2            /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java   1081      manual mode
    

    you can set default java version like this:

    sudo update-java-alternatives -s java-1.8.0-openjdk-amd64
    

    If java is not installed, you need to install it via (for example):

    sudo apt install openjdk-11-jre-headless