I'm trying to create a template DataFlow in Python from a pipeline with multiple files dependency.
This is the project structure:
root
|
----> project_dir
|
----> __init__.py
----> main.py
----> setup.py
utils
|
----> functions.py
----> __init__.py
In the file main.py there is the import line:
from project_dir.utils.functions import something
And my setup.py file contains (as suggested here ):
package_dir={'.': ''},
packages=setuptools.find_packages()
The template file is generated with no problems, but everytime I try to execute it on DataFlow I get the following error:
ImportError: No module named 'project_dir'
Can someone please help me? Thanks in advance!
To solve this problem I switched to the following structure:
root
|
----> project_dir
|
----> __init__.py
----> main.py
utils
|
----> functions.py
----> __init__.py
setup.py
installment_requirements.txt
This is my setup.py file:
import setuptools
requires = [
'google-cloud-storage==1.36.1',
'pysftp==0.2.9'
]
setuptools.setup(
name='name',
version='0.0.1',
install_requires=requires,
packages=setuptools.find_packages()
)
Then I create the template with a Cloudbuild that installs the requirements and executes the pipeline with template creation parameters:
steps:
- name: 'python:3.8-slim'
args: ['pip', 'install', '--upgrade', 'pip']
waitFor: ['-']
id: 'upgrade-pip'
- name: 'python:3.8-slim'
args: ['pip', 'install', '-r', './installment_requirements.txt', '--user']
waitFor: ['upgrade-pip']
id: 'install-requirements'
- name: 'python:3.8-slim'
args: ["python", "./project_dir/main.py"]
env: ['PYTHONPATH=./', 'DEPLOYMENT_ENVIRONMENT=${_DEPLOYMENT_ENVIRONMENT}']
waitFor: ['install-requirements']
id: 'create-df-template
The file installment_requirements.txt is an export with pip freeze in order to get dependencies to be installed during the template creation.