Search code examples
pythonetlaws-glue

AWSGLUE python package - ls cannot access dir


I'm trying to install local awsglue package for developing purpose on my local machine (Windows + Git Bash)

https://github.com/awslabs/aws-glue-libs/tree/glue-1.0

https://support.wharton.upenn.edu/help/glue-debugging

Spark directory and py4j mentioned in below error does exist but still getting error

enter image description here Directory from which I trigger the sh is below:

user@machine xxxx64~/Desktop/lm_aws_glue/aws-glue-libs-glue-1.0/bin
$ ./glue-setup.sh
ls: cannot access 'C:\Spark\spark-3.1.1-bin-hadoop2.7/python/lib/py4j-*-src.zip': No such file or directory
rm: cannot remove 'PyGlue.zip': No such file or directory
./glue-setup.sh: line 14: zip: command not found

ls result:

$ ls -l
total 7
-rwxr-xr-x 1 n1543781 1049089 135 May  5  2020 gluepyspark*
-rwxr-xr-x 1 n1543781 1049089 114 May  5  2020 gluepytest*
-rwxr-xr-x 1 n1543781 1049089 953 Mar  5 11:10 glue-setup.sh*
-rwxr-xr-x 1 n1543781 1049089 170 May  5  2020 gluesparksubmit*

Solution

  • Original install code requires few tweaks and works ok. Still need a workaround for zip.

    #!/usr/bin/env bash
    
    #original code
    #ROOT_DIR="$(cd $(dirname "$0")/..; pwd)"
    #cd $ROOT_DIR
    
    #re-written
    ROOT_DIR="$(cd /c/aws-glue-libs; pwd)" 
    cd $ROOT_DIR
    
    SPARK_CONF_DIR=$ROOT_DIR/conf
    GLUE_JARS_DIR=$ROOT_DIR/jarsv1
    
    #original code
    #PYTHONPATH="$SPARK_HOME/python/:$PYTHONPATH"
    #PYTHONPATH=`ls $SPARK_HOME/python/lib/py4j-*-src.zip`:"$PYTHONPATH"
    
    #re-written
    PYTHONPATH="/c/Spark/spark-3.1.1-bin-hadoop2.7/python/:$PYTHONPATH"
    PYTHONPATH=`ls /c/Spark/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-*-src.zip`:"$PYTHONPATH"
    
    # Generate the zip archive for glue python modules
    rm PyGlue.zip
    zip -r PyGlue.zip awsglue
    GLUE_PY_FILES="$ROOT_DIR/PyGlue.zip"
    export PYTHONPATH="$GLUE_PY_FILES:$PYTHONPATH"
    
    # Run mvn copy-dependencies target to get the Glue dependencies locally
    #mvn -f $ROOT_DIR/pom.xml -DoutputDirectory=$ROOT_DIR/jarsv1 dependency:copy-dependencies
    
    export SPARK_CONF_DIR=${ROOT_DIR}/conf
    mkdir $SPARK_CONF_DIR
    rm $SPARK_CONF_DIR/spark-defaults.conf
    # Generate spark-defaults.conf
    echo "spark.driver.extraClassPath $GLUE_JARS_DIR/*" >> $SPARK_CONF_DIR/spark-defaults.conf
    echo "spark.executor.extraClassPath $GLUE_JARS_DIR/*" >> $SPARK_CONF_DIR/spark-defaults.conf
    
    # Restore present working directory
    cd -