I'm trying to get mvn to download Delta Lake to a directory so I can manually mount it in Spark. This is so the Spark application doesn't have to do a web request to mvn for security purposes, ideally preventing it from talking to the WWW altogether.
Can anybody help? My assumption is I can't just go download the Jar from mvn because I'll be missing all the dependencies in the chain.
I'm using
mvn org.apache.maven.plugins:maven-dependency-plugin:3.2.0:get \
-DremoteRepositories=https://repo.maven.apache.org/maven2/ \
-Dartifact=io.delta:delta-spark_2.13:3.1.0 \
-Ddest=~/_mvn_local
This report builds success. The plan is to then mount all of the jars in that directory to Spark via an environment variable. But there is no output to target dir. I want to avoid having to define a POM needlessly. The exact output is:
data-lake % mvn org.apache.maven.plugins:maven-dependency-plugin:3.2.0:get \
-DremoteRepositories=https://repo.maven.apache.org/maven2/ \
-Dartifact=io.delta:delta-spark_2.13:3.1.0 \
-Ddest=~/_build
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------< org.apache.maven:standalone-pom >-------------------
[INFO] Building Maven Stub Project (No POM) 1
[INFO] --------------------------------[ pom ]---------------------------------
[INFO]
[INFO] --- dependency:3.2.0:get (default-cli) @ standalone-pom ---
[INFO] Resolving io.delta:delta-spark_2.13:jar:3.1.0 with transitive dependencies
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 0.548 s
[INFO] Finished at: 2024-03-01T14:37:59Z
[INFO] ------------------------------------------------------------------------```
Here's a script that does what I want:
#!/bin/bash
mvn -f <your pom> dependency:copy-dependencies -DoutputDirectory=../spark/_build/spark_jars -Dscope=provided
If you want to download all the dependencies you have to define a POM file, point Maven to the pom via the -f
flag, and have it download all the transitive dependencies with the dependencies:copy-dependency declaration and an output. Dscope=provided
seems to be a requirement to get everything too.