In order to speed up the execution of an ETL job, I've implemented a regression algorithm in Cython "regression.pyx" rather than python.
Unfortunately, I couldn't find any documentation, how I can integrate properly in AWS Glue job.
I would like to import the Cython regression module in the python glue job as follows:
from regression import reg
Usually, the Cython script has to be built with a setup.py script, then it can be imported. What is the best way to integrated properly in AWS glue job?
Any help would be appreciated.
You can specify an external library location when you are creating the job.
You just upload the .zip or .whl file to S3 and specify the path.
More information on that here.
Buildspec for my CodePipeline:
BuildGlueModules:
Type: AWS::CodeBuild::Project
Properties:
Artifacts:
Type: CODEPIPELINE
Environment:
ComputeType: BUILD_GENERAL1_MEDIUM
Image: aws/codebuild/standard:4.0
Type: LINUX_CONTAINER
Name: !Sub ${AWS::StackName}-BuildGlueModules
ServiceRole: !Ref CodeBuildRole
Source:
Type: CODEPIPELINE
BuildSpec: !Sub |
version: 0.2
phases:
install:
runtime-versions:
python: 3.8
pre_build:
commands:
- python3 setup.py bdist_wheel
build:
commands:
- aws s3 sync ./dist/ s3://my-bucket/glue_modules