Search code examples
pythonaws-gluepymssql

AWS Glue Python-Shell : How to provide your own library?


I'd like to have a aws glue python-shell job connect to a MS SQL Server. I understand that I should use the pymssql library. On my computer I have the script working but with AWS I understand that I need to upload the pymssql library to S3 and reference it.

I'm following their example on how to provide your own egg file if I wanted to connect to redshift but after creating the egg file and running the script I get this error

Couldn't find index page for 'redshift-module' (maybe misspelled?)

Can anyone help provide how I can accomplish providing my own library? In either redshift or ms sql. Just looking for an example I can adapt and work from.

Full Job Log

Creating /glue/lib/installation/site.py
Processing redshift_module-0.1-py3.7.egg
Copying redshift_module-0.1-py3.7.egg to /glue/lib/installation
Adding redshift-module 0.1 to easy-install.pth file

Installed /glue/lib/installation/redshift_module-0.1-py3.7.egg
Processing dependencies for redshift-module==0.1
Searching for redshift-module==0.1
Reading https://pypi.org/simple/redshift-module/
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/

Full Error Output

Couldn't find index page for 'redshift-module' (maybe misspelled?)
No local packages or working download links found for redshift-module==0.1
error: Could not find suitable distribution for Requirement.parse('redshift-module==0.1')

Solution

  • The answer is mentioned here

    In a nut shell, AWS Glue uses Python 3.6 while the egg 'redshift_module-0.1-py3.7.egg' has been built using python 3.7

    You might also need to need to have a look on the documentation which has some useful packaging options like install_requires=['package']