Search code examples
azure-data-lakeazure-synapse

Azure Synapse Serverless - SQL query to return rows in directory for each file


I have an Azure Data Lake Gen2 Container in which I have several json files. I would like to write a query that returns a record for each file. I am not interested in parsing the files, I just want to know what files are there and have this returned in a view. Does anyone have any tips on how I might do this? Everything I have found is about how to parse/read the files...I am going to let Power BI do that since the json format is not standard. In this case I just need a listing of files. Thanks!


Solution

  • You can use the filepath() and filename() function in Azure Synapse Analytics serverless SQL pools to return those. You can even GROUP BY them to return aggregated results. A simple example:

    SELECT 
        [result].filepath() AS filepath,
        [result].filename() AS filename,
        COUNT(*) AS records
    FROM
        OPENROWSET(
            BULK 'https://azureopendatastorage.blob.core.windows.net/nyctlc/yellow/puYear=2019/puMonth=4/*.parquet',
            FORMAT = 'PARQUET'
        ) AS [result]
    GROUP BY [result].filepath(), [result].filename()
    

    See the documentation for further examples.