I'm trying to install local awsglue
package for developing purpose on my local machine (Windows + Git Bash)
https://github.com/awslabs/aws-glue-libs/tree/glue-1.0
https://support.wharton.upenn.edu/help/glue-debugging
Spark
directory and py4j
mentioned in below error does exist but still getting error
Directory from which I trigger the sh is below:
user@machine xxxx64~/Desktop/lm_aws_glue/aws-glue-libs-glue-1.0/bin
$ ./glue-setup.sh
ls: cannot access 'C:\Spark\spark-3.1.1-bin-hadoop2.7/python/lib/py4j-*-src.zip': No such file or directory
rm: cannot remove 'PyGlue.zip': No such file or directory
./glue-setup.sh: line 14: zip: command not found
ls
result:
$ ls -l
total 7
-rwxr-xr-x 1 n1543781 1049089 135 May 5 2020 gluepyspark*
-rwxr-xr-x 1 n1543781 1049089 114 May 5 2020 gluepytest*
-rwxr-xr-x 1 n1543781 1049089 953 Mar 5 11:10 glue-setup.sh*
-rwxr-xr-x 1 n1543781 1049089 170 May 5 2020 gluesparksubmit*
Original install code requires few tweaks and works ok. Still need a workaround for zip
.
#!/usr/bin/env bash
#original code
#ROOT_DIR="$(cd $(dirname "$0")/..; pwd)"
#cd $ROOT_DIR
#re-written
ROOT_DIR="$(cd /c/aws-glue-libs; pwd)"
cd $ROOT_DIR
SPARK_CONF_DIR=$ROOT_DIR/conf
GLUE_JARS_DIR=$ROOT_DIR/jarsv1
#original code
#PYTHONPATH="$SPARK_HOME/python/:$PYTHONPATH"
#PYTHONPATH=`ls $SPARK_HOME/python/lib/py4j-*-src.zip`:"$PYTHONPATH"
#re-written
PYTHONPATH="/c/Spark/spark-3.1.1-bin-hadoop2.7/python/:$PYTHONPATH"
PYTHONPATH=`ls /c/Spark/spark-3.1.1-bin-hadoop2.7/python/lib/py4j-*-src.zip`:"$PYTHONPATH"
# Generate the zip archive for glue python modules
rm PyGlue.zip
zip -r PyGlue.zip awsglue
GLUE_PY_FILES="$ROOT_DIR/PyGlue.zip"
export PYTHONPATH="$GLUE_PY_FILES:$PYTHONPATH"
# Run mvn copy-dependencies target to get the Glue dependencies locally
#mvn -f $ROOT_DIR/pom.xml -DoutputDirectory=$ROOT_DIR/jarsv1 dependency:copy-dependencies
export SPARK_CONF_DIR=${ROOT_DIR}/conf
mkdir $SPARK_CONF_DIR
rm $SPARK_CONF_DIR/spark-defaults.conf
# Generate spark-defaults.conf
echo "spark.driver.extraClassPath $GLUE_JARS_DIR/*" >> $SPARK_CONF_DIR/spark-defaults.conf
echo "spark.executor.extraClassPath $GLUE_JARS_DIR/*" >> $SPARK_CONF_DIR/spark-defaults.conf
# Restore present working directory
cd -