Search code examples
azureazure-synapseazure-data-lake-gen2

Azure Synapse severless SQL pool - query execution fails


After completing tutorial 1, I am working on this tutorial 2 from Microsoft Azure team to run the following query (shown in step 3). But the query execution gives the error shown below:

Question: What may be the cause of the error, and how can we resolve it?

Query:

SELECT
    TOP 100 *
FROM
    OPENROWSET(
        BULK 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet',
        FORMAT='PARQUET'
    ) AS [result]

Error:

Warning: No datasets were found that match the expression 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet'. Schema cannot be determined since no files were found matching the name pattern(s) 'https://contosolake.dfs.core.windows.net/users/NYCTripSmall.parquet'. Please use WITH clause in the OPENROWSET function to define the schema.

NOTE: The path of the file in the container is correct, and actually I generated the following query just by right clicking the file inside container and generated the script as shown below: enter image description here

Remarks:

  1. Azure Data Lake Storage Gen2 account name: contosolake
  2. Container name: users
  3. Firewall settings used on the Azure Data lake account: enter image description here
  4. Azure Data Lake Storage Gen2 account is allowing public access (ref): Azure Data Lake Storage Gen2 account
  5. Container has required access level (ref) enter image description here

UPDATE:

The owner of the subscription is someone else, and I did not get the option Check the "Assign myself the Storage Blob Data Contributor role on the Data Lake Storage Gen2 account" box described in item 3 of Basics tab > Workspace details section of tutorial 1. I also do not have permissions to add roles - although I'm the owner of synapse workspace. So I am using workaround described in the Configure anonymous public read access for containers and blobs from Azure team.


Solution

  • --Workaround

    If you are unable to granting Storage Blob Data Contributor, use ACL to grant permissions.

    All users that need access to some data in this container also needs to have the EXECUTE permission on all parent folders up to the root (the container). Learn more about how to set ACLs in Azure Data Lake Storage Gen2.

    Note:

    Execute permission on the container level needs to be set within the Azure Data Lake Gen2. Permissions on the folder can be set within Azure Synapse.

    Go to the container holding NYCTripSmall.parquet.

    enter image description here

    enter image description here

    --Update

    As per your update in comments, it seems you would have to do as below.

    Contact the Owner of the storage account, and ask them to perform the following tasks:

    • Assign the workspace MSI to the Storage Blob Data Contributor role on the storage account
    • Assign you to the Storage Blob Data Contributor role on the storage account

    --

    I was able to get the query results following the tutorial doc you have mentioned for the same dataset.

    enter image description here

    Since you confirm that the file is present and in the right path, refresh linked ADLS source and publish query before running, just in case if a transient issue.

    Two things I suspect are

    1. Try setting Microsoft network routing in Network Routing settings in ADLS account.

    enter image description here

    1. Check if built-in pool is online and you have atleast contributer roles on both Synapse workspace and Storage account. (If the current credentials using to run the query has not created the resources)