I have a data asset in azure machine learning. This is of type folder and the folder contains 4 different files with different schemas. when I consume this data asset in the azure ML notebook, it treats the different files as partitions and messes up the schema. I want to select individual files while pulling into the notebook.
I tried to pass the file name as a parameter in the path variable as shown below:
import mltable
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
ml_client = MLClient.from_config(credential=DefaultAzureCredential())
data_asset = ml_client.data.get("data_asset_name", version="1")
path = {
'folder': data_asset.path + "file_name.csv"
}
tbl = mltable.from_delimited_files(paths=[path])
df = tbl.to_pandas_dataframe()
df
But this gives the following error:
UserErrorException:
Error Code: ScriptExecution.StreamAccess.NotFound
Native Error: Dataflow visit error: ExecutionError(StreamError(NotFound))
VisitError(ExecutionError(StreamError(NotFound)))
=> Failed with execution error: error in streaming from input data sources
ExecutionError(StreamError(NotFound))
Error Message: The requested stream was not found. Please make sure the request uri is correct.| session_id= <some id>
How do I pull in individual files?
According to this documentation from_delimited_files
supports paths with
files or folders with local or cloud paths
So, when you want to read files mention file
in dictionary, if it is folder then mention folder
.
Alter your code like below.
path = {
'file': data_asset.path + "winequality-white.csv"
}
tbl = mltable.from_delimited_files(paths=[path],delimiter=';')
df = tbl.to_pandas_dataframe()
df
Output: