Search code examples
pythonpysparkdatabricksdatabricks-unity-catalog

Check if table exists in Unity Meta Catalog


So I am trying to build a weekly Data import to Unity Catalog in Databricks. I use python.

There is no problem overwriting the table in case it exists:

%sql
Use catalog some_catalog

dfTarget                 #some pandas dataframe
df_sparkTarget=spark.createDataFrame(dfTarget)
df_sparkTarget.write.format("delta").mode("overwrite").saveAsTable(database+"."+table)

But before I ovverwrite anything I would like to check for the existence of this table:

if spark.catalog.tableExists( database+"."+table):
  print("Table exists")
else:
  print("Table does not exist")

This returns the following error

py4j.security.Py4JSecurityException: Method public boolean org.apache.spark.sql.internal.CatalogImpl.tableExists(java.lang.String) is not whitelisted on class class org.apache.spark.sql.internal.CatalogImpl

Where can I whitelist this? Maybe just for this one cluster and not for all.


Solution

  • Yes I have found much the same for now until there is documented ways to do this I am using this. Catalog functionality only seems to work for the hivemetastore

    def schema_exists(catalog:str, schema_name:str):
        query = spark.sql(f"""
                SELECT 1 
                FROM {catalog}.information_schema.schemata 
                WHERE schema_name = '{schema_name}' 
                LIMIT 1""")
        
        return query.count() > 0
    
    def table_exists(catalog:str, schema:str, table_name:str):
        query = spark.sql(f"""
                SELECT 1 
                FROM {catalog}.information_schema.tables 
                WHERE table_name = '{table_name}' 
                AND table_schema='{schema}' LIMIT 1""",
            )
        return query.count() > 0
    
    schema_exists("catalog_name","schema_name")
    table_exists("catalog_name","schema_name", "table_name")