Search code examples
dockerapache-sparkcopy-pasteavrospark-avro

Copying avro jars into docker jars directory


I'm learning spark I'd like to use an avro data file as avro is external to spark. I've downloaded the jar. But my problem is how to copy it into that specific place 'jars dir' into my container? enter image description here I've read relative post here but I do not understand.

I've see also this command below from spark main website but I think I need the jar file copied before running it.

./bin/spark-shell --packages org.apache.spark:spark-avro_2.XX:X.X.X ...

What I tried is

docker cp /Users/username/Downloads/spark-avro_2.11-2.4.5.jar docker-spark_master_1:/jars

but it's not working. thanks in advance

nb: I'm running spark 2.4 container with worker and master.


Solution

  • Quoting docker cp Documentation,

    docker cp SRC_PATH CONTAINER:DEST_PATH

    If SRC_PATH specifies a file and DEST_PATH does not exist then the file is saved to a file created at DEST_PATH

    From the command you tried,

    The destination path /jars does not exist in the container since the actual destination should have been /usr/spark-2.4.1/jars/. Thus the jar was copied to the container with the name jars under the root (/) directory.

    Try this command instead to add the jar to spark jars,

    docker cp /Users/username/Downloads/spark-avro_2.11-2.4.5.jar docker-spark_master_1:/usr/spark-2.4.1/jars/