I am running into the deduplicate error in sbt assembly trying to package my multiproject spark job. I looked in the sbt assembly documentation, and it says
If you're trying to exclude JAR files that are already part of the container (like Spark), consider scoping the dependent library to "provided" configuration:
But what do they mean by "already part of the container?" I have copied the full link below.
https://github.com/sbt/sbt-assembly#excluding-jars-and-files
That means that the specific target node (a container like Spark) has already all the required jars and there is no need to include the same jars in the assembly.
This is the meaning of provided scope, i.e. a particular library is provided in the running environment and therefore don't include it in the final jar just make a reference to it.