For some reasons, I want to use the python package awswrangler
inside a Python 3 Glue Job. There are two main ways I've considered for installing awswrangler
:
Specify additional libraries to a glue job. By considering .whl
file and then passing it to the Glue Job through the --extra-py-files
Installing inside the python script with subprocess
or os
. For example, the code example with os
is the following
import os
os.system('python -m pip install --user awswrangler==0.0.b0')
Notice in the last case, that I've gone down to even use the first pre-release version of awswrangler
. Full list of versions can be found here. However, even with the first prelease I'm unable to use awswrangler
on a Glue script. Is there a way to achieve this?
It turns out that the official Awswrangler Documentation provides you with a .whl
file, that contains the desired version of the package, to specify on the Python library path field
of the Glue Job. According to the documentation, the steps to follow are:
Download the .whl
file related to the version that you want to install of awswrangler
from here.
Upload the .whl
file to an s3 bucket, notice that the role you assign to your glue job should have access to read this bucket.
In the in the Python library path field specify the location of the wheel file. For example, for the current 1.9.3 version it is s3://your-bucket/glue_wheels/awswrangler-1.9.3-py3-none-any.whl