I am using github actions to build pyspark .zip files using the following yaml snippet
name: Build Artifacts
on:
push:
branches:
- main
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9"]
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Make artifact directory
run: mkdir -p ./dist
- uses: actions/checkout@v2
- name: Create Zip File
uses: montudor/[email protected]
with:
args: sh -c "cd data_compaction && zip -r ../src.zip src/"
- name: Push zip file to S3
uses: qoqa/[email protected]
env:
AWS_S3_BUCKET: 'dev-bucket'
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: 'us-east-1'
AWS_S3_PATH: '/artifacts/src.zip'
FILE: 'src.zip'
- name: Push main file to S3
uses: qoqa/[email protected]
env:
AWS_S3_BUCKET: 'dev-bucket'
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: 'us-east-1'
AWS_S3_PATH: '/artifacts/main.py'
FILE: './data_compaction/main.py'
The zip file is getting created and is successfully pushed to S3. But when I try to import the modules in zip, I am getting a ModuleNotFound error. I am running spark-submit --py-files src.zip main.py
However, when I zip the file on my local machine using a Makefile and running the spark submit, it works file. Makefile looks like this:
build:
rm -f -r ./dist
mkdir ./dist
cp main.py ./dist
cd ./src && zip -r ../dist/src.zip .
My project directory is as follows
── data_compaction
├── Makefile
├── main.py
└── src
├── jobs
│ ├── __init__.py
│ ├── xyz.py
│ └── abc.py
└── utilities
├── __init__.py
└── spark_foundation.py
And my main.py has this snippet to import the modules:
if os.path.exists('src.zip'):
sys.path.insert(0, 'src.zip')
else:
sys.path.insert(0, './src')
from utilities.spark_foundation import spark_session
from jobs.xyz import func1
from jobs.abc import func2
PS: I am new to github actions
Have you also unzipped the src.zip from s3? In the makefile you change into the src directory and zip everything underneath, while in the ci yml, you change into data_compaction and zip the src directory recursively, which includes the src directory. It should work again, when you change the CI command to:
- name: Create Zip File
uses: montudor/[email protected]
with:
args: sh -c "cd data_compaction/src && zip -r ../src.zip ."