Search code examples
amazon-web-servicesapache-sparkpysparkaws-glueaws-glue-spark

AWS glue spark submit use Spark avro


How to specify/pass packages parameters to the AWS glue spark job?

I am using Glue version 1 which supports Spark 2.4.3 and want to use Spark avro to read some avro files


Solution

  • You cannot provide the package option to the glue jobs instead download the dependent jars from maven repository

    Then place those jars in s3 and pass them as additional jars to your job. This way you can access those jars within the job.

    For spark2.4.3 you need to pass this jar

    https://repo1.maven.org/maven2/org/apache/spark/spark-avro_2.12/2.4.3/spark-avro_2.12-2.4.3.jar