I have a data asset in azure machine learning. This is of type folder and the folder contains 4 different files with different schemas. when I consume this data asset in the azure ML notebook, it treats the different files as partitions and messes up the schema. I want to select individual files while pulling into the notebook.
I tried to pass the file name as a parameter in the path variable as shown below:
import mltable
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
data_asset = ml_client.data.get("data_asset_name", version="1")
path = {
'folder': data_asset.path + "file_name.csv"
tbl = mltable.from_delimited_files(paths=[path])
df = tbl.to_pandas_dataframe()
But this gives the following error:
Error Code: ScriptExecution.StreamAccess.NotFound
Native Error: Dataflow visit error: ExecutionError(StreamError(NotFound))
=> Failed with execution error: error in streaming from input data sources
Error Message: The requested stream was not found. Please make sure the request uri is correct.| session_id= <some id>
How do I pull in individual files?
According to this documentation from_delimited_files
supports paths with
files or folders with local or cloud paths
So, when you want to read files mention file
in dictionary, if it is folder then mention folder
Alter your code like below.
path = {
'file': data_asset.path + "winequality-white.csv"
tbl = mltable.from_delimited_files(paths=[path],delimiter=';')
df = tbl.to_pandas_dataframe()