Search code examples
pythonpandasaws-glue

Using Pandas AWS Glue Python Shell Jobs


The AWS Documentation https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html

mentions that

The environment for running a Python shell job supports the following libraries:

...

pandas (required to be installed via the python setuptools configuration, setup.py)

But it does not mention how to make the install.

How can I use Pandas in a AWS Glue Python Shell Jobs ?


Solution

    1. Goto https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-extra-library. Check section To create a Python .egg or .whl file for 'how to create setup file for python shell job'
    2. In setup.py file, add line install_requires=['pandas==0.25.1']:
    setup(name="<module name>",
            version="0.1",
            packages=['<package name if any or ignore>'],
            install_requires=['pandas==0.25.1']
        )
    

    I also wrote small shell script to deploy python shell job without manual steps to create egg file and upload to s3 and deploy via cloudformation. Script does all automatically. You may find code at https://github.com/fatangare/aws-python-shell-deploy