I hava airflow cluster and spark cluster, when i submit my job
task_id='flight_search_ingestion',
conn_id='spark_test_id',
total_executor_cores=4,
application='./plugins/plug_code.py',
executor_cores=2,
executor_memory='5g',
driver_memory='5g',
name='flight_search_ingestion',
execution_timeout=timedelta(minutes=10),
dag=dag
)
which gives me error JAVA HOME is not set
in airflow cluster i dont have java, should i have java installed in airflow cluster as well ?
You need java install if you are using spark. what is the instance os?
you can check via:
cat /etc/os-release
if it is a Linux based system.
Let's say it is ubuntu.
you can check available java versions:
sudo update-alternatives --config java
result should be like this:
0 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 auto mode
1 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 manual mode
* 2 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java 1081 manual mode
you can set default java version like this:
sudo update-java-alternatives -s java-1.8.0-openjdk-amd64
If java is not installed, you need to install it via (for example):
sudo apt install openjdk-11-jre-headless