Search code examples
automlazure-machine-learning-service

Error in azureml "Non numeric value(s) were encountered in the target column."


I am using Automated ML to run a time series forecasting pipeline.

When the AutoMLStep gets triggered, I get this error: Non numeric value(s) were encountered in the target column.

The data to this step is passed through an OutputTabularDatasetConfig, after applying the read_delimited_files() on an OutputFileDatasetConfig. I've inspected the prior step, and the data is comprised of a 'Date' column and a numeric column called 'Place' with +80 observations in monthly frequencies.

Nothing seems to be wrong with the column type or the data. I've also applied a number of techniques on the data prep side e.g. pd.to_numeric(), astype(float) to ensure it is numeric.

I've also tried forcing this through the FeaturizationConfig() add_column_purpose('Place','Numeric') but in this case, I get another error: Expected column(s) Place in featurization config's column purpose not found in X.

Any thoughts on how to solve?


Solution

  • So a few learnings on this interacting with the stellar Azure Machine Learning engineering team.

    1. When calling the read_delimited_files() method, ensure that the output folder does not have many inputs or files. For example, if all intermediate outputs are saved to a common folder, it may read all the prior inputs into this folder, and depending upon the shape of the data, borrow the schema from the first file, or confuse all of them together. This can lead to inconsistencies and errors. In my case, I was dumping many files to the same location, hence this was causing confusion for this method. The fix is either to distinctly mark the output folder (e.g. with a UUID) or give different paths.
    2. The dataframe from read_delimiter_files() may treat all columns as object type which can lead to a data type check failure (i.e. label_column needs to be numeric). To mitigate, explictly state the type. For example:
    from azureml.data import DataType
    prepped_data = prepped_data.read_delimited_files(set_column_types={"Place":DataType.to_float()})