I have a location in ADLS and need to ingest data from said location into Unity Catalog. This directory in ADLS has a mixture of .txt, .txt.parquet and .parquet. I am using autoloader and parquet option to ingest this data.
CREATE STREAMING LIVE TABLE Example_raw
TBLPROPERTIES ("quality" = "bronze")
AS SELECT * FROM cloud_files("/mnt/Example", "parquet");
But the presence of .txt and txt.parquet files is causing the ingestion to fail. Can autoloader handle multiple file types in a ingestion?
Thanks
Answer to your specific question:
The autoloader can only work with one file format at a time.
Possible workaround:
To ingest multiple file formats into the same table using autoloader, you could try using a UNION ALL
with a corresponding glob filter (pathGlobFilter
option) for each file format. I haven't tested the code below, but hopefully the concept conveys:
CREATE STREAMING LIVE TABLE Example_raw
TBLPROPERTIES ("quality" = "bronze")
AS
SELECT *
FROM cloud_files(
"/mnt/Example",
"parquet",
map("pathGlobFilter", "*.parquet"))
UNION ALL
SELECT *
FROM cloud_files(
"/mnt/Example",
"text",
map("pathGlobFilter", "*.txt"))
;
Files matching the filter must match the defined format. So, *.txt.parquet
files must actually be Parquet files or the ingestion will fail.
See AutoLoader syntax and pathGlobFilter under File Format Options for additional details.