I am trying to register a data set as a Python step with the Azure Machine Learning Studio designer. Here is my code:
import pandas as pd
from azureml.core import Workspace, Run, Dataset
def azureml_main(dataframe1 = None, dataframe2 = None):
run = Run.get_context()
ws = run. experiment.workspace
ds = Dataset.from_pandas_dataframe(dataframe1)
ds.register(workspace = ws,
name = "data set name",
description = "example description",
create_new_version = True)
return dataframe1,
I get an error saying that "create_new_version" in the ds.register line was an unexpected keyword argument. However, this keyword appears in the documentation and I need it to keep track of new versions of the file.
If I remove the argument, I get a different error: "Local data source path not supported for this operation", so it still does not work. Any help is appreciated. Thanks!
sharing OP's solution here for easier discovery
import pandas as pd
from azureml.core import Workspace, Run, Dataset
def azureml_main(dataframe1 = None, dataframe2 = None):
run = Run.get_context()
ws = run. experiment.workspace
datastore = ws.get_default_datastore()
ds = Dataset.Tabular.register_pandas_dataframe(
dataframe1, datastore, 'data_set_name',
description = 'data set description.')
return dataframe1,
Sorry you're struggling. You're very close!
A few things may be the culprit here.
Dataset
class, which has been deprecated. I recommend trying Dataset.Tabular.register_pandas_dataframe()
(docs link) instead of Dataset.from_pandas_dataframe()
. (more about the Dataset API deprecation)register_pandas_dataframe
method inside the EPS module, but might have better luck with save the dataframe first to parquet, then calling Dataset.Tabular.from_parquet_files
Hopefully something works here!