Including a Period between Dictionaries in Apache Spark with Databricks

It would appear Databricks now require you to include the 'Catalog Name' when creating a Delta Table(well, at least in the way that I want to create a Delta Table).

Without going through the whole code, the following code will create a database and delta table in traditional Databricks with Hive_metastore called 'baseapm.securelogin'

deltadf = DeltaTable.forName(spark,f"{stageName}{regName}")

stageName = base

regName = apm.securelogin

Combining those two variables will output: baseapm.securelogin

Databricks appear to have made some changes and it would appear you now have to create the delta tables inside the Catalog Name in which case I need to modify the code above to include the Catalog Name.

My Catalog Name is called liveend2end.

Therefore, I need to modify my code above such that it outputs the following: liveend2end.baseapm.securelogin

I tried the following.

catName = 'liveend2end'
deltadf = DeltaTable.forName(spark,f"{catName}{stageName}{regName}")

But all I got was liveend2endbaseapm.securelogin

However, the output I need is: liveend2end.baseapm.securelogin

Could someone please show me how to modify the code so that I get liveend2end.baseapm.securelogin?

I also tried the following to add a period between catName

deltadf = DeltaTable.forName(spark,f"{catName}'.'{stageName}{regName}")

But I got the error:

ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near ''.''.

I have been shown how to add a period in the past, and I remember is that it was quite simple ... but I just can't remember how to place the period at the moment ....

Solution

One tip is to remember that catalogs, schemas, and tables should never have periods in their names; the period is the delimiter that informs the Spark catalog how to parse the fully qualified table name: <catalog>.<schema>.<table>. Naming your variables in a way that aligns 1-to-1 with each of those 3 layers may help reduce confusion.

In the Op's question, these can be correctly identified as:

catalog = liveend2end
schema = baseapm
table = securelogin

So the desired fully qualified table name, or expected output should be liveend2end.baseapm.securelogin.

Now.. solutions:

(TL;DR) Option 1: String Concatenation

Using the OP's existing approach, which was only slightly off.

catName = 'liveend2end'
stageName = 'base'
regName = 'apm.securelogin'

table = f"{catName}.{stageName}{regName}"
deltadf = DeltaTable.forName(spark, table)

Option 2: `USE CATALOG ...`

Similar, but this sets the Spark Session to use the specified catalog for any subsequent queries. This may be simpler if you want to minimize code changes.

catName = 'liveend2end'
stageName = 'base'
regName = 'apm.securelogin'

# Use Catalog
spark.sql(f"USE CATALOG {catName}")

# Remaining code is not changed:
table = f"{stageName}{regName}"
deltadf = DeltaTable.forName(spark, table)

Option 3: Spark Conf to set default catalog

Similar to option #2 but this requires zero code change and instead sets the default catalog using a Spark configuration which you would add to your cluster.

spark.databricks.sql.initial.catalog.name=liveend2end

# No code changes
stageName = 'base'
regName = 'apm.securelogin'

table = f"{stageName}{regName}"
deltadf = DeltaTable.forName(spark, table)

Note: while it's highly recommended to upgrade to catalogs in Unity Catalog, hive_metastore is still a valid type of catalog. All the above options can be used with catName = 'hive_metastore' as well.

Including a Period between Dictionaries in Apache Spark with Databricks

(TL;DR) Option 1: String Concatenation

Option 2: USE CATALOG ...

Option 3: Spark Conf to set default catalog

Option 2: `USE CATALOG ...`