Search code examples
pythonamazon-web-servicesapache-sparkaws-gluewindows-subsystem-for-linux

AWS Glue locally - No module named 'awsglue'


I installed each prerequisites according to https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html#develop-local-python and still getting No module named 'awsglue' error.

  • AWS Glue version 3.0,
  • Apache Maven from the following location: https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-common/apache-maven-3.6.0-bin.tar.gz
  • AWS Glue version 3.0: https://aws-glue-etl-artifacts.s3.amazonaws.com/glue-3.0/spark-3.1.1-amzn-0-bin-3.2.1-amzn-3.tgz
  • SPARK_HOME is setup
  • ran glue-setup.sh from \\wsl$\Ubuntu-20.04\home\my_user\aws_ds\glue_libs\aws-glue-libs\bin
  • when I run spark-shell or pyspark, both are working fine

Please help on debbuging this as I don't know where to start else.


Solution

  • Working solution:

    1. Make sure your Glue script is ran in the aws-glue-libs folder
    2. Sync jar files between jarsv1 in aws-glue-libs and jars in your_spark_folder (quava jar may have two versions, leave latest one)

    Installation steps to consider

    1. Get Spark on WSL2: https://phoenixnap.com/kb/install-spark-on-ubuntu
    2. Remember to run glue-setup.sh from aws-glue-libs\bin as a last step of Setting up Glue locally