Search code examples
databricksazure-databricks

Databricks Generating Error: AnalysisException: [ErrorClass=INVALID_PARAMETER_VALUE] Missing cloud file system scheme


When I attempt to create or save a table to a location in my Azure Datalake Gen 2 using example code:

%sql
CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "/mnt/training/ecommerce/events/events.parquet");

I get the error:

[RequestId=xxxx-xxxx-6789-8u98-de33192c16e0 ErrorClass=INVALID_PARAMETER_VALUE] GenerateTemporaryPathCredential uri /mnt/training/ecommerce/events is not a valid URI. Error message: INVALID_PARAMETER_VALUE: Missing cloud file system scheme.

I did some research and came across the following suggestion from Databricks:

In Databricks, when reading data from cloud storage like AWS S3, Azure data lake Storage, or Google Cloud Storage, you must include the scheme corresponding to the cloud storage.

I appreciate that is a correct suggestion, however I have mounted the ADLS Storage account with Databricks, and I'm able to read files from the ADLS account using df = spark.read.csv("/mnt/training/ecommerce/events/events.csv", inferSchema=True, header=True)

Any thoughts?

I attempted the solution suggested by @Bhavani, but unfortunately it didn't work. I added and a new External Location as suggested: enter image description here

When I try and create a table using the mounted drive I get the same error:

enter image description here

Just to show the that drive is mounted see the below image:

enter image description here

The crazy thing is, I can read from the mounted drive, see image: enter image description here

One last update to show you that the connection to ADLS Gen mounted drive is successful, see image. So, I'm not sure why I'm still getting the error: enter image description here


Solution

  • INVALID_PARAMETER_VALUE] GenerateTemporaryPathCredential uri /mnt/files/Iris.parquet is not a valid URI. Error message: INVALID_PARAMETER_VALUE: Missing cloud file system scheme.
    

    When working with Azure Data Lake Storage Gen2 in a Synapse or Databricks environment, you need to specify the scheme (abfss:// for ADLS Gen2). You have provided mounted path, that may be the reason to get above error. Instead of mounting ADLS Gen2 account you can follow below procedure:

    Grant Storage blob data contributor role to Azure databricks managed identity. Go to Catalog page in databricks workspace click on + Select Add an external location as shown below:

    enter image description here

    Configure details while creating a new external location as shown below:

    enter image description here

    Create the external location. After successful creation of external location, you will be able to create table using below code:

    %sql
    CREATE  TABLE  IF  NOT  EXISTS events USING  parquet  OPTIONS (path  "abfss://<containerName>@<storageAccountName>.dfs.core.windows.net/<filepath>");
    

    enter image description here

    You can query table successfully as shown below:

    enter image description here

    update:

    You can use below code to create table with mounted storage account:

    %sql
    CREATE TABLE events1 AS
    SELECT
      *
    FROM
      read_files(
        '/mnt/<mountName>/<pathToParquetFile>/Iris.parquet',
        header => "True"
      ); 
    

    It will create table successfully.