Search code examples
pythonamazon-web-servicesaesaws-gluepycrypto

How to use Crypto.Cipher AES in AWS Glue?


I am currently using the module Crypto.Cipher AES taken from https://github.com/Doerge/awslambda-pycrypto in my AWS lambda function and it works perfectly for my case.

from Crypto.Cipher import AES
from botocore.vendored import requests

url = 'my_url'
PARAMS =  {'param1':'val1', 'param2':'val2'}
CIPHER_KEY = 'cipher_key'

req = requests.get(url, params = PARAMS).json()
ciphered_value = r['ciphered_value']
decipher = AES.new(CIPHER_KEY, AES.MODE_ECB)
value =  decipher.decrypt(ciphered_value)

However, Lambda is failing as I am surpassing my 15 minute limitation due to the number of values that need to be processed

I am trying to run an AWS Glue Python Shell job that runs the exact same code as Glue can last for more than 15 minutes plus gives me access to other resources and the AWS Data Catalogue.

However, when I run my job I get the following error:

Traceback (most recent call last):
File "/tmp/runscript.py", line 115, in <module>
runpy.run_path(temp_file_path, run_name='__main__')
File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmp/glue-python-scripts-87edl8q9/playlist_ingestor_glue.py", line 10, in <module>
ModuleNotFoundError: No module named 'Crypto'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/tmp/runscript.py", line 134, in <module>
raise e_type(e_value).with_tracsback(new_stack)
AttributeError: 'ModuleNotFoundError' object has no attribute 'with_tracsback'

So clearly, it's not being able to read the Crypto module.

ModuleNotFoundError: No module named 'Crypto'

I followed this:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-libraries.html

And this:

https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-egg-library

And added the setup.py file with the contents:

from setuptools import setup

setup(
    name="Crypto",
    version="0.1",
    packages=['Crypto']

And compressed the contents of the Crypto directory + the setup.py file into the zip file pycrypto.zip which I uploaded into S3 and referenced in my Glue job's Python library path.

Modules compressed into pycrypto.zip

After all of this, I still get the error.

I have run my script using the following combinations with no results:

from Crypto.Crypto.Cipher import AES
from Crypto.Cipher import AES
from Cipher import AES

What is the correct way to use this compiled module in AWS Glue? I find it frustrating as the compatible python libraries are very limited and there's not lots of examples, documentation or community posts to explain how to achieve this yet.


Solution

  • Found this question

    AWS Glue Python

    Which suggested using this:

    import os
    import site
    from setuptools.command import easy_install
    install_path = os.environ['GLUE_INSTALLATION']
    easy_install.main( ["--install-dir", install_path, "<library-name>"] )
    reload(site)
    
    
    import <installed library>
    

    So I took the latest version from https://pypi.org/project/pycrypto/#files

    import os
    import site
    from setuptools.command import easy_install
    install_path = os.environ['GLUE_INSTALLATION']
    
    easy_install.main( ["--install-dir", install_path, "https://files.pythonhosted.org/packages/60/db/645aa9af249f059cc3a368b118de33889219e0362141e75d4eaf6f80f163/pycrypto-2.6.1.tar.gz"] )
    reload(site)
    

    And it's working!

    Still, I would like to know how to use it as a referenced python library.