Search code examples
apache-sparkdatabricksazure-databricksdelta-lake

Azure Databricks : Mount delta table used in another workspace


Currently I have an azure databricks instance where I have the following

myDF.withColumn("created_on", current_timestamp())\
.writeStream\
.format("delta")\
.trigger(processingTime= triggerDuration)\
.outputMode("append")\
.option("checkpointLocation", "/mnt/datalake/_checkpoint_Position")\
.option("path", "/mnt/datalake/DeltaData")\
.partitionBy("col1", "col2", "col3", "col4", "col5")\
.table("deltadata")

This is saving the data into a storage account as blobs.

Now, I'm trying to connect to this table from another azure databricks workspace and my first "move" is the mount to the azure storage account:

dbutils.fs.mount(
    source = sourceString,
    mountPoint = "/mnt/data",
    extraConfigs = Map(confKey -> sasKey)

Note: sourceString, confKey and sasKey are not shown for obvious reasons, in any case the mount works fine.

And then I try to create the table, but I get an error:

CREATE TABLE delta_data USING DELTA LOCATION '/mnt/data/DeltaData/'

Error in SQL statement: AnalysisException: 
You are trying to create an external table `default`.`delta_data`
from `/mnt/data/DeltaData` using Databricks Delta, but the schema is not specified when the
input path is empty.

According to the documentation the schema should be picked up from the existing data correct? Also, I trying to do this in a different workspace because the idea is to give only read access to people.


Solution

  • It seems my issue was the mount. It did not give any error while creating it but was not working fine. I discovered this after trying:

    dbutils.fs.ls("/mnt/data/DeltaData")
    

    Which was not showing anything. I unmounted and reviewed all the configs and after that it worked.