I have got 2 components in Azure Machine Learning. I have got 2 dataframes in the first component (called prep) which I want to pass into the next component (called middle) for further processing.
In the prep code, I have tried to save the dataframe into the component's output section, into a datastore and into the args location passed in as input parameters. As shown below:
print((Path(args.Y_df) / "Y_df.csv"))
df1.to_csv("./outputs/Y_df.csv")
df1.to_csv(args.Y_df.path)
df1.to_csv("azureml://subscriptions/subscription_id/resourcegroups/rg_group/workspaces/workspace_name/datastores/datastore_name/paths/azureml/forecast/testing/y_df.csv")
Out of these only the first method works. Now I want to pass this into the next component. So in the pipeline definition code, I have mentioned this:
def data_pipeline(
compute_train_node: str,
):
prep_node = prep()
transform_node = middle(Y_df=prep_node.outputs.Y_df,
S_df=prep_node.outputs.S_df)
I am trying to run a basic code in the middle component but it just does not get started. It fails with the following error:
Below are YAMLS for prep and middle: middle:
name: middle4 display_name: middle4
inputs: Y_df:
type: uri_file S_df:
type: uri_file
code: ./middle
environment: azureml:environment_name:4
command: >- python middle_script.py --Y_df ${{inputs.Y_df}}
--S_df ${{inputs.S_df}}
prep:
name: preprocessing24
display_name: preprocessing24
outputs:
Y_df:
type: uri_file
S_df:
type: uri_file
code: ./preprocessing
environment: azureml:environment_name:4
command: >-
python preprocessing_script.py
--Y_df ${{outputs.Y_df}}
--S_df ${{outputs.S_df}}
What am I doing wrong? How do I pass file from one component to the other?
Edit after trying out the method in the answer:
As of now, args.Y_df points to some random (probably default) file path instead of the one I have given it as part of the Output() function as mentioned in the answer. It then gives an error saying
OSError: Cannot save file into a non-existent directory: '/mnt/azureml/cr/j/32h438dshj537dj284ndhs630e1/cap/data-capability/wd/Y_df/testing'
Below is the code I have written for getting the path into the prep code. This path is used to save the dataframes as csv.
parser = argparse.ArgumentParser("prep")
parser.add_argument("--Y_df", type=str, help="Path of prepped data")
parser.add_argument("--S_df", type=str, help="Path of prepped data")
parser.add_argument("--clinical_actuals_path", type=str, help="Path of prepped data")
args = parser.parse_args()
Answering, based on all the information provided by JayashankarGS above. His method is what solved almost the entire issue and I just added one extra parameter to the code that he has provided.
from azure.ai.ml import MLClient, Input, Output
def data_pipeline(
compute_train_node: str,
):
prep_node = prep()
prep_node.outputs.Y_df= Output(type="uri_folder", mode = 'rw_mount', path="azureml://datastores/<datastore_name>/paths/csvs/Y_df/")
prep_node.outputs.S_df= Output(type="uri_folder", mode = 'rw_mount', path="azureml://datastores/<datastore_name>/paths/csvs/S_df/")
transform_node = middle(Y_df=prep_node.outputs.Y_df,
S_df=prep_node.outputs.S_df)
This is the same code that JayashankarGS has posted, I just added another parameter in the Output() function
mode = 'rw_mount'
This solved all the issues.