Search code examples
mavenpysparkivyaws-java-sdk

Module not found: com.amazonaws#aws-Java-sdk


I'm trying to submit a spark job with these two packages:

com.amazonaws:aws-Java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.1

My spark version is 3.1.2, hadoop version is 2.7.4 and java version is 11.0.12. The Airflow (2.2.2) is on Kubernetes (k8s).

When I submit the job, I got the message bellow: "unresolved dependencies"

Here are all the debug infos. Note that the package was found successfully!

[...]
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - :: loading settings :: url = jar:file:/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - Ivy Default Cache set to: /home/airflow/.ivy2/cache
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - The jars for the packages stored in: /home/airflow/.ivy2/jars
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - org.mongodb.spark#mongo-spark-connector_2.12 added as a dependency
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - com.amazonaws#aws-Java-sdk added as a dependency
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - org.apache.hadoop#hadoop-aws added as a dependency
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - :: resolving dependencies :: org.apache.spark#spark-submit-parent-427a6d1b-6847-4f25-8345-a134eb6d8e19;1.0
[2021-12-20, 12:51:20 -03] {spark_submit.py:523} INFO - confs: [default]
[2021-12-20, 12:51:21 -03] {spark_submit.py:523} INFO - found org.mongodb.spark#mongo-spark-connector_2.12;3.0.0 in central
[2021-12-20, 12:51:24 -03] {spark_submit.py:523} INFO - found com.amazonaws#aws-java-sdk;1.7.4 in central <---------- FOUND!
[...]
2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ---------------------------------------------------------------------
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - |                  |            modules            ||   artifacts   |
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ---------------------------------------------------------------------
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - |      default     |   75  |   1   |   0   |   0   ||   74  |   0   |
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ---------------------------------------------------------------------
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - :: problems summary ::
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - :::: WARNINGS
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - module not found: com.amazonaws#aws-Java-sdk;1.7.4
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ==== local-m2-cache: tried
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - file:/home/airflow/.m2/repository/com/amazonaws/aws-Java-sdk/1.7.4/aws-Java-sdk-1.7.4.pom
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - -- artifact com.amazonaws#aws-Java-sdk;1.7.4!aws-Java-sdk.jar:
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - file:/home/airflow/.m2/repository/com/amazonaws/aws-Java-sdk/1.7.4/aws-Java-sdk-1.7.4.jar
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ==== local-ivy-cache: tried
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - /home/airflow/.ivy2/local/com.amazonaws/aws-Java-sdk/1.7.4/ivys/ivy.xml
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - -- artifact com.amazonaws#aws-Java-sdk;1.7.4!aws-Java-sdk.jar:
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - /home/airflow/.ivy2/local/com.amazonaws/aws-Java-sdk/1.7.4/jars/aws-Java-sdk.jar
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ==== central: tried
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - https://repo1.maven.org/maven2/com/amazonaws/aws-Java-sdk/1.7.4/aws-Java-sdk-1.7.4.pom
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - -- artifact com.amazonaws#aws-Java-sdk;1.7.4!aws-Java-sdk.jar:
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - https://repo1.maven.org/maven2/com/amazonaws/aws-Java-sdk/1.7.4/aws-Java-sdk-1.7.4.jar
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ==== spark-packages: tried
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - https://repos.spark-packages.org/com/amazonaws/aws-Java-sdk/1.7.4/aws-Java-sdk-1.7.4.pom
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - -- artifact com.amazonaws#aws-Java-sdk;1.7.4!aws-Java-sdk.jar:
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - https://repos.spark-packages.org/com/amazonaws/aws-Java-sdk/1.7.4/aws-Java-sdk-1.7.4.jar
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ::::::::::::::::::::::::::::::::::::::::::::::
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ::          UNRESOLVED DEPENDENCIES         ::
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ::::::::::::::::::::::::::::::::::::::::::::::
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - :: com.amazonaws#aws-Java-sdk;1.7.4: not found
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - ::::::::::::::::::::::::::::::::::::::::::::::
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - 
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
[2021-12-20, 12:51:26 -03] {spark_submit.py:523} INFO - Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: com.amazonaws#aws-Java-sdk;1.7.4: not found]

Some important notes:

  • If I run on my machine (same config) it works;
  • If I navigate thought the path looking for the jar, I can found it on "~/.ivy2/cache/com.amazonaws/aws-java-sdk/jars" (a different path of "Ivy Default Cache");
  • I don't know how to set the spark.jars.ivySettings and/or spark.jars.ivy properties - I'm not the "java guy";
  • If I click on link to the maven repo, It downloads the jar as expected

Solution

  • After some other tries I realize that the "J" in uppercase on the string caused the error. When I changed the string to this one:

    com.amazonaws:aws-java-sdk:1.7.4,org.apache.hadoop:hadoop-aws:2.7.1

    The build got fine. I think I have bad luck when copying the reference for this dependency from the web, which came with this upper "j".