Search code examples
pythonmachine-learningamazon-sagemakeraws-step-functions

Sagemaker Step function SDK not downloading any of the python modules in requirements.txt


I define this estimator as so

from sagemaker.tensorflow.estimator import TensorFlow
env = {
    'SAGEMAKER_REQUIREMENTS': 'requirements.txt', # path relative to `source_dir` below.
}
keras_estimator = TensorFlow(
    entry_point=sm_script,
    role=workflow_execution_role,
    instance_count=1,
    instance_type=training_instance,
    dependencies=[sm_script, 'requirements.txt'],
    env=env,
    requirements_file='requirements.txt',
    sagemaker_session=sm_sess,
    framework_version="1.15.2",
    base_job_name='{}-training'.format(base_name),
    py_version="py3",
    distribution={"parameter_server": {"enabled": True}},
    metric_definitions=[
        {'Name': 'validation_accuracy', 'Regex': "Belt Vision accuracy = ([0-9.]+)"},
        {'Name': 'validation_f1', 'Regex': "Belt Vision f1 = ([0-9.]+)"}]
)

Yes, there are multiple points I refer to requirements.txt yet none of them work. For an sklearn estimator I just use the dependencies part. For some reason none of my python modules are downloaded. I cannot see them even collecting in Cloud Watch. Thus I keep getting the same error in sm_script where I try to import cv2. Hence I get the following error

ModuleNotFoundError: No module named 'cv2'

Any suggestions? As requested here is my requirements.txt

sagemaker==2.65.0
pandas==1.2.4
scikit-learn==0.23.1
awswrangler==2.12.1
boto3==1.19.1
numpy~=1.19.2
opencv-python
random
keras==2.13.1
tensorflow==2.13.0

Solution

  • The way I resolved this was simply placing this into the .py file where I run the training to ensure that the opencv-python package is installed

    import sys
    import subprocess
    
    # implement pip as a subprocess:
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', opencv-python])