Search code examples
azureapache-sparkparquetazure-machine-learning-service

How can I create an Azure dataset in Azure ML studio (through the GUI) from a parquet file created with Azure Spark


I'm trying to load files as a dataset in the GUI of Azure ML Studio. These parquet files have been created through Spark.

In my folder, Spark creates files such as "_SUCCESS" or "_committed_8998000".

Azure ML Studio is not able to read them or ignore them and tells me:

The provided file(s) have invalid byte(s) for the specified file encoding.
{
  "message": " "
}

I selected "Ignore unmatched files path" and yet, it still does not work.

If I remove the "_SUCCESS" and other Spark files, it works.


Solution

  • Thanks for the feedback. You can use globing in path. e.g. path = '**/*.parquet' to select only the parquet files