im getting this error when submitting a scheduled pipeline, Failed to submit job due to Exception: Response status code does not indicate success: 400 (Exception(s) encountered while validating environment definition. See inner exception for details. (). Microsoft.RelInfra.Common.Exceptions.ErrorResponseException: Exception(s) encountered while validating environment definition. See inner exception for details. (BaseImage, BaseDockerfile, or BuildContext must be set for Docker-based environments.) (Conda dependencies were not specified. Please make sure that conda dependencies are specified in your run configuration.). here is my code
from azureml.core import Experiment, ScriptRunConfig, Environment, Workspace
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.runconfig import RunConfiguration
from azureml.pipeline.core import Pipeline, ScheduleRecurrence, Schedule
import datetime
ws = Workspace.from_config()
cluster_name = "cpu-cluster"
try:
compute_target = ComputeTarget(workspace=ws, name=cluster_name)
print(f'Found existing cluster, use it: {cluster_name}')
except ComputeTargetException:
print(f"{cluster_name} not found")
env = Environment.from_conda_specification(name='job-env', file_path='job-env.yml')
from azureml.pipeline.steps import PythonScriptStep
# Assuming `compute_target` and `env` are already defined
script_step = PythonScriptStep(
name="Run Python Script",
script_name="search.py",
compute_target=compute_target,
source_directory="./",
runconfig=env,
allow_reuse=True
)
experiment = Experiment(workspace=ws, name='search_experiment2')
from azureml.core import Experiment
from azureml.pipeline.core import Pipeline
pipeline = Pipeline(workspace=ws, steps=[script_step])
published_pipeline = pipeline.publish(
name="search_pipeline_2",
description="Pipeline Description",
continue_on_step_failure=False)
from azureml.pipeline.core import Schedule, ScheduleRecurrence
recurrence = ScheduleRecurrence(frequency="Day", interval=1) # Define your recurrence pattern
schedule = Schedule.create(
workspace=ws,
name="DailyScriptRun",
description="Daily run of script",
pipeline_id=published_pipeline.id,
experiment_name="searchExperimentName",
recurrence=recurrence)
and here is my env yml file
name: job-env
channels:
- conda-forge
dependencies:
- python=3.10
- pip=21.2.4
- pip:
- aenum==3.1.15
- aiohttp==3.9.3
- aiosignal==1.3.1
- annotated-types==0.6.0
- anyio==4.3.0
- appdirs==1.4.4
- APScheduler==3.10.4
- attrs==23.2.0
- azure-ai-documentintelligence==1.0.0b2
- azure-ai-formrecognizer==3.3.2
- azure-ai-textanalytics==5.3.0
- azure-common==1.1.28
- azure-core==1.30.1
- azure-identity==1.15.0
- azure-keyvault-secrets==4.8.0
- azure-search==1.0.0b2
- azure-search-documents==11.6.0b2
- azure-storage-blob==12.19.1
- azure-storage-file-datalake==12.14.0
- beautifulsoup4==4.12.3
- bs4==0.0.2
- case_conversion==2.1.0
- certifi==2024.2.2
- cffi==1.16.0
- chardet==5.2.0
- charset-normalizer==3.3.2
- clipboard==0.0.4
- colorama==0.4.6
- colour==0.1.5
- contourpy==1.2.0
- cryptography==42.0.5
- cursor==1.3.5
- cycler==0.12.1
- decorator==4.4.2
- dill==0.3.8
- distro==1.9.0
- fonttools==4.49.0
- frozenlist==1.4.1
- fuzzywuzzy==0.18.0
- gender-guesser==0.4.0
- h11==0.14.0
- html-text==0.5.2
- httpcore==1.0.4
- httpx==0.27.0
- idna==3.6
- imageio==2.34.0
- imageio-ffmpeg==0.4.9
- infi.systray==0.1.12
- inflect==7.0.0
- isodate==0.6.1
- kiwisolver==1.4.5
- lxml==5.1.0
- matplotlib==3.8.3
- maybe-else==0.2.1
- mbstrdecoder==1.1.3
- moviepy==1.0.3
- msal==1.27.0
- msal-extensions==1.1.0
- msoffcrypto-tool==5.3.1
- msrest==0.7.1
- multidict==6.0.5
- numpy==1.26.4
- O365==2.0.34
- oauthlib==3.2.2
- office365==0.3.15
- Office365-REST-Python-Client==2.5.6
- olefile==0.47
- openai==1.7.1
- packaging==24.0
- pandas==2.2.1
- parsedatetime==2.6
- pathmagic==0.3.14
- pillow==10.2.0
- portalocker==2.8.2
- prettierfier==1.0.3
- proglog==0.1.10
- pycparser==2.21
- pydantic==2.6.3
- pydantic_core==2.16.3
- pydub==0.25.1
- pyinstrument==4.6.2
- pyiotools==0.3.18
- PyJWT==2.8.0
- pymiscutils==0.3.14
- PyMuPDF==1.23.25
- PyMuPDFb==1.23.22
- pyparsing==3.1.2
- pypdf==4.1.0
- PyPDF2==3.0.1
- pyperclip==1.8.2
- PyQt5==5.15.10
- PyQt5-Qt5==5.15.2
- PyQt5-sip==12.13.0
- pysubtypes==0.3.18
- python-dateutil==2.9.0.post0
- python-docx==1.1.0
- python-dotenv==1.0.1
- pytz==2024.1
- pywin32==306
- readchar==4.0.5
- regex==2023.12.25
- requests==2.31.0
- requests-oauthlib==1.4.0
- Send2Trash==1.8.2
- simplejson==3.19.2
- six==1.16.0
- sniffio==1.3.1
- soupsieve==2.5
- stringcase==1.2.0
- tabulate==0.9.0
- tenacity==8.2.3
- tiktoken==0.6.0
- tqdm==4.66.2
- typepy==1.3.2
- typing_extensions==4.10.0
- tzdata==2024.1
- tzlocal==5.2
- urllib3==2.2.1
- XlsxWriter==3.2.0
- yarl==1.9.4
the error indicates that the problem in environment also what im trying to do is nothing related to ML but i want to use compute cluster because my script will take at least 1h of running , is this the best way to do it ?
According to this documentation you need to use RunConfiguration for runconfig
parameter
in PythonScriptStep.
So, you use RunConfiguration like below code.
script_step = PythonScriptStep(
name="Run Python Script",
script_name="search.py",
compute_target=compute_target,
source_directory="./",
runconfig=RunConfiguration(conda_dependencies=CondaDependencies(conda_dependencies_file_path='job-env.yml')),
allow_reuse=True
)
Refer above documentation for more about parameter in RunConfiguration.