I have written parquet files in a path like this on the blob storage container: https://<my_blob_storage_accnt>.dfs.core.windows.net/current-snapshot/businesspartner=P1
, https://<my_blob_storage_accnt>.dfs.core.windows.net/current-snapshot/businesspartner=P2
, etc.
When I read the parquet files in Synapse SQL using OPENROWSET
, like below --
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://<my_blob_storage_accnt>.dfs.core.windows.net/current-snapshot/**',
FORMAT = 'PARQUET'
) AS [result]
I realized that the output table does NOT have businesspartner
column. When I was using Spark/PySpark, I was able to do something like below to (source) see businesspartner
as a column:
val df = sparkSession.read
.option("basePath", path)
.parquet(path + "/businesspartner=*/*.parquet")
I looked around and found these resources (1, 2 and 3). The closest approaches I can see, which are not quite natural for a SQL person like myself, are this approach, which suggests using WHERE filepath() = 'P1'
or as this approach suggests, to explicitly list individual blob paths for each partition to filter them like below:
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK (
'https://<my_blob_storage_accnt>.dfs.core.windows.net/current-snapshot/businesspartner=P1',
'https://<my_blob_storage_accnt>.dfs.core.windows.net/current-snapshot/businesspartner=P2'
),
FORMAT = 'PARQUET'
) AS [result]
Is there a parameter in OPENROWSET
or some other approach that would concisely allow me to read the whole suite of partitions as a column in the result table, from which I can use WHERE
to filter the businesspartners
if I wish to?
Thank you in advance for your answers/suggestions!
Assuming you are using SQL Serverless then you can pass a parameter to filepath
as below (as documented here)
SELECT
TOP 100 *,
[result].filepath() AS WholePath,
[result].filepath(1) AS Wildcard1,
[result].filepath(2) AS Wildcard2
FROM
OPENROWSET(
BULK 'https://<my_blob_storage_accnt>.dfs.core.windows.net/current-snapshot/*=*/.parquet',
FORMAT = 'PARQUET'
) AS [result]
WHERE [result].filepath(2) IN ('P1', 'P2')