Search code examples
pythonpandasazureazure-machine-learning-service

Erro InvalidInputDatatype: Input of type 'Unknown' is not supported in azure (azureml.train.automl)


I have a pandas's DataFrame created by:

TB_HISTORICO_MODELO = pd.read_sql("""select DAT_INICIO_SEMANA_PLAN
,COD_NEGOCIO
,VENDA
,LUCRO
,MODULADO
,RUPTURA
,QTD_ESTOQUE_MEDIO
,PECAS from TB""", cursor)

TB_HISTORICO_MODELO["DAT_INICIO_SEMANA_PLAN"] = pd.to_datetime(TB_HISTORICO_MODELO["DAT_INICIO_SEMANA_PLAN"])

dataset = TB_HISTORICO_MODELO[TB_HISTORICO_MODELO['COD_NEGOCIO']=='A101'].drop(columns=['COD_NEGOCIO']) .reset_index(drop=True)

Everything look like right.

>>> dataset.dtypes
DAT_INICIO_SEMANA_PLAN    datetime64[ns]
VENDA                            float64
LUCRO                            float64
MODULADO                           int64
RUPTURA                            int64
QTD_ESTOQUE_MEDIO                  int64
PECAS                            float64
dtype: object

But when I rum this:

#%% Create the AutoML Config file and run the experiment on Azure

from azureml.train.automl import AutoMLConfig

time_series_settings = {
   'time_column_name': 'DAT_INICIO_SEMANA_PLAN',
   'max_horizon': 14,
   'country_or_region': 'BR',
   'target_lags': 'auto'
}

automl_config = AutoMLConfig(task='forecasting',
                            primary_metric='normalized_root_mean_squared_error',
                            blocked_models=['ExtremeRandomTrees'],
                            experiment_timeout_minutes=30,
                            training_data=dataset,
                            label_column_name='VENDA',
                            compute_target = compute_cluster,
                            enable_early_stopping=True,
                            n_cross_validations=3,
                            # max_concurrent_iterations=4,
                            # max_cores_per_iteration=-1,
                            verbosity=logging.INFO,
                            **time_series_settings)

remote_run = Experimento.submit(automl_config, show_output=True)

I get the message

>>> remote_run = Experimento.submit(automl_config, show_output=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/core/experiment.py", line 219, in submit
    run = submit_func(config, self.workspace, self.name, **kwargs)
  File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/train/automl/automlconfig.py", line 92, in _automl_static_submit
    automl_config_object._validate_config_settings(workspace)
  File "/home/fnord/venv/lib64/python3.6/site-packages/azureml/train/automl/automlconfig.py", line 1775, in _validate_config_settings
    supported_types=", ".join(SupportedInputDatatypes.REMOTE_RUN_SCENARIO)
azureml.train.automl.exceptions.ConfigException: ConfigException:
        Message: Input of type 'Unknown' is not supported. Supported types: [azureml.data.tabular_dataset.TabularDataset, azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset]
        InnerException: None
        ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "Input of type 'Unknown' is not supported. Supported types: [azureml.data.tabular_dataset.TabularDataset, azureml.pipeline.core.pipeline_output_dataset.PipelineOutputTabularDataset]",
        "details_uri": "https://aka.ms/AutoMLConfig",
        "target": "training_data",
        "inner_error": {
            "code": "BadArgument",
            "inner_error": {
                "code": "ArgumentInvalid",
                "inner_error": {
                    "code": "InvalidInputDatatype"
                }
            }
        }
    }
}

Where is wrong?

documentation: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train https://learn.microsoft.com/pt-br/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig


Solution

  • Configure AutoML Doc says:

    For remote experiments, training data must be accessible from the remote compute. AutoML only accepts Azure Machine Learning TabularDatasets when working on a remote compute.

    It looks as if your dataset object is a Pandas DataFrame, when it should really be an Azure ML Dataset. Check out this doc on creating Datasets.