Search code examples
apache-sparkazure-storagedatabricksazure-databricksdelta-lake

Registering a cloud data source as global table in Databricks without copying


Given that I have a Delta table in Azure storage:

wasbs://[email protected]/mydata

This is available from my Databricks environment. I now wish to have this data available through the global tables, automatically loaded to all clusters and visible in the "Data" section.

I could easily do this through copying:

spark.read\
  .load("wasbs://[email protected]/mydata")\
  .write.saveAsTable("my_new_table")

But this is expensive, and I would need to run it occasionally (structured streaming would help, however). But is it possible to register the source as a global table directly, without having to copy all files?


Solution

  • You can use CREATE TABLE USING statement in a databricks notebook cell:

    %sql
    
    CREATE TABLE IF NOT EXISTS default.my_new_table 
      USING DELTA 
      LOCATION "wasbs://[email protected]/mydata"
    

    Table my_new_table should appear in your default database in databricks data tab.