I have an Azure Data Lake Gen2 Container in which I have several json files. I would like to write a query that returns a record for each file. I am not interested in parsing the files, I just want to know what files are there and have this returned in a view. Does anyone have any tips on how I might do this? Everything I have found is about how to parse/read the files...I am going to let Power BI do that since the json format is not standard. In this case I just need a listing of files. Thanks!
You can use the filepath()
and filename()
function in Azure Synapse Analytics serverless SQL pools to return those. You can even GROUP BY
them to return aggregated results. A simple example:
SELECT
[result].filepath() AS filepath,
[result].filename() AS filename,
COUNT(*) AS records
FROM
OPENROWSET(
BULK 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=2019/puMonth=4/*.parquet',
FORMAT = 'PARQUET'
) AS [result]
GROUP BY [result].filepath(), [result].filename()
See the documentation for further examples.