I have a pandas's DataFrame created by:
TB_HISTORICO_MODELO = pd.read_sql("""select DAT_INICIO_SEMANA_PLAN
,COD_NEGOCIO
,VENDA
,LUCRO
,MODULADO
,RUPTURA
,QTD_ESTOQUE_MEDIO
,PECAS from TB""", cursor)
TB_HISTORICO_MODELO["DAT_INICIO_SEMANA_PLAN"] = pd.to_datetime(TB_HISTORICO_MODELO["DAT_INICIO_SEMANA_PLAN"])
dataset = TB_HISTORICO_MODELO[TB_HISTORICO_MODELO['COD_NEGOCIO']=='A101'].drop(columns=['COD_NEGOCIO']) .reset_index(drop=True)
Everything look like right.
>>> dataset.dtypes
DAT_INICIO_SEMANA_PLAN datetime64[ns]
VENDA float64
LUCRO float64
MODULADO int64
RUPTURA int64
QTD_ESTOQUE_MEDIO int64
PECAS float64
dtype: object
But when I rum this:
#%% Create the AutoML Config file and run the experiment on Azure
from azureml.train.automl import AutoMLConfig
time_series_settings = {
'time_column_name': 'DAT_INICIO_SEMANA_PLAN',
'max_horizon': 14,
'country_or_region': 'BR',
'target_lags': 'auto'
}
automl_config = AutoMLConfig(task='forecasting',
primary_metric='normalized_root_mean_squared_error',
blocked_models=['ExtremeRandomTrees'],
experiment_timeout_minutes=30,
training_data=dataset,
label_column_name='VENDA',
compute_target = compute_cluster,
enable_early_stopping=True,
n_cross_validations=3,
# max_concurrent_iterations=4,
# max_cores_per_iteration=-1,
verbosity=logging.INFO,
**time_series_settings)
remote_run = Experimento.submit(automl_config, show_output=True)
I get the message
>>> remote_run = Experimento.submit(automl_config, show_output=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/core/experiment.py", line 219, in submit
run = submit_func(config, self.workspace, self.name, **kwargs)
File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/train/automl/automlconfig.py", line 92, in _automl_static_submit
automl_config_object._validate_config_settings(workspace)
File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/train/automl/automlconfig.py", line 1775, in _validate_config_settings
supported_types=", ".join(SupportedInputDatatypes.REMOTE_RUN_SCENARIO)
azureml.train.automl.exceptions.ConfigException: ConfigException:
Message: Input of type 'Unknown' is not supported. Supported types: [azureml.data.tabular_dataset.TabularDataset, azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset]
InnerException: None
ErrorResponse
{
"error": {
"code": "UserError",
"message": "Input of type 'Unknown' is not supported. Supported types: [azureml.data.tabular_dataset.TabularDataset, azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset]",
"details_uri": "https://aka.ms/AutoMLConfig",
"target": "training_data",
"inner_error": {
"code": "BadArgument",
"inner_error": {
"code": "ArgumentInvalid",
"inner_error": {
"code": "InvalidInputDatatype"
}
}
}
}
}
Where is wrong?
documentation: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train https://learn.microsoft.com/pt-br/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig
Configure AutoML Doc says:
For remote experiments, training data must be accessible from the remote compute. AutoML only accepts Azure Machine Learning TabularDatasets when working on a remote compute.
It looks as if your dataset
object is a Pandas DataFrame, when it should really be an Azure ML Dataset
. Check out this doc on creating Datasets.