I'm trying to submit a spark job with these two packages
:
com.amazonaws:aws-Java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.1
My spark version is 3.1.2, hadoop version is 2.7.4 and java version is 11.0.12. The Airflow (2.2.2) is on Kubernetes (k8s).
When I submit the job, I got the message bellow:
Here are all the debug infos. Note that the package was found successfully!
[...]
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - :: loading settings :: url = jar:file:/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - Ivy Default Cache set to: /home/airflow/.ivy2/cache
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - The jars for the packages stored in: /home/airflow/.ivy2/jars
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - org.mongodb.spark#mongo-spark-connector_2.12 added as a dependency
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - com.amazonaws#aws-Java-sdk added as a dependency
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - org.apache.hadoop#hadoop-aws added as a dependency
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - :: resolving dependencies :: org.apache.spark#spark-submit-parent-427a6d1b-6847-4f25-8345-a134eb6d8e19;1.0
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - confs: [default]
[2021-12-20, 12:51:21 -03] {spark_submit.py:523} INFO - found org.mongodb.spark#mongo-spark-connector_2.12;3.0.0 in central
[2021-12-20, 12:51:24 -03] {spark_submit.py:523} INFO - found com.amazonaws#aws-java-sdk;1.7.4 in central <---------- FOUND!
[...]
2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ---------------------------------------------------------------------
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - | | modules || artifacts |
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - | conf | number| search|dwnlded|evicted|| number|dwnlded|
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ---------------------------------------------------------------------
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - | default | 75 | 1 | 0 | 0 || 74 | 0 |
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ---------------------------------------------------------------------
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - :: problems summary ::
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - :::: WARNINGS
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - module not found: com.amazonaws#aws-Java-sdk;1.7.4
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ==== local-m2-cache: tried
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - file:/home/airflow/.m2/repository/com/amazonaws/aws-Java-sdk/1.7.4/aws-Java-sdk-1.7.4.pom
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - -- artifact com.amazonaws#aws-Java-sdk;1.7.4!aws-Java-sdk.jar:
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - file:/home/airflow/.m2/repository/com/amazonaws/aws-Java-sdk/1.7.4/aws-Java-sdk-1.7.4.jar
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ==== local-ivy-cache: tried
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - /home/airflow/.ivy2/local/com.amazonaws/aws-Java-sdk/1.7.4/ivys/ivy.xml
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - -- artifact com.amazonaws#aws-Java-sdk;1.7.4!aws-Java-sdk.jar:
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - /home/airflow/.ivy2/local/com.amazonaws/aws-Java-sdk/1.7.4/jars/aws-Java-sdk.jar
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ==== central: tried
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - https://repo1.maven.org/maven2/com/amazonaws/aws-Java-sdk/1.7.4/aws-Java-sdk-1.7.4.pom
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - -- artifact com.amazonaws#aws-Java-sdk;1.7.4!aws-Java-sdk.jar:
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - https://repo1.maven.org/maven2/com/amazonaws/aws-Java-sdk/1.7.4/aws-Java-sdk-1.7.4.jar
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ==== spark-packages: tried
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - https://repos.spark-packages.org/com/amazonaws/aws-Java-sdk/1.7.4/aws-Java-sdk-1.7.4.pom
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - -- artifact com.amazonaws#aws-Java-sdk;1.7.4!aws-Java-sdk.jar:
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - https://repos.spark-packages.org/com/amazonaws/aws-Java-sdk/1.7.4/aws-Java-sdk-1.7.4.jar
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ::::::::::::::::::::::::::::::::::::::::::::::
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - :: UNRESOLVED DEPENDENCIES ::
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ::::::::::::::::::::::::::::::::::::::::::::::
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - :: com.amazonaws#aws-Java-sdk;1.7.4: not found
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ::::::::::::::::::::::::::::::::::::::::::::::
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO -
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.amazonaws#aws-Java-sdk;1.7.4: not found]
Some important notes:
spark.jars.ivySettings
and/or spark.jars.ivy
properties - I'm not the "java guy";After some other tries I realize that the "J" in uppercase on the string caused the error. When I changed the string to this one:
com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.1
The build got fine. I think I have bad luck when copying the reference for this dependency from the web, which came with this upper "j".