I'm trying to load files as a dataset in the GUI of Azure ML Studio. These parquet files have been created through Spark.
In my folder, Spark creates files such as "_SUCCESS" or "_committed_8998000".
Azure ML Studio is not able to read them or ignore them and tells me:
The provided file(s) have invalid byte(s) for the specified file encoding.
{
"message": " "
}
I selected "Ignore unmatched files path" and yet, it still does not work.
If I remove the "_SUCCESS" and other Spark files, it works.
Thanks for the feedback. You can use globing in path. e.g. path = '**/*.parquet' to select only the parquet files