while working with Databricks i had the need to remove an external location.
I runt SHOW EXTERNAL LOCATIONS
with this results:
After this I runt DROP EXTERNAL LOCATION gold_prd
Recivieng the error:
Storage Location has dependent catalogs, schemas, managed or external tables/volumes; You may use force option to delete it but the managed storage data under this location cannot be purged by Unity Catalog anymore.
I have no catalog/schemas or other object under this location, but before forcing the operation is there a way to list all objects dependend from this location?
This is the ugly, slow and dirty solution I wrote to check for dependant objects, I think it could be extended to check also for volumes.
%python
import pandas as pd
check_locations = []
query = "SHOW EXTERNAL LOCATIONS"
result = spark.sql(query)
for row in result.collect():
check_locations.append(row["url"])
#######################
# catalogs #
#######################
query = "SHOW CATALOGS"
result = spark.sql(query)
for row in result.collect():
catalog_name = row["catalog"]
query = f"DESCRIBE CATALOG EXTENDED {catalog_name}"
result = spark.sql(query)
df = pd.DataFrame(result.collect()).transpose()
if not df.empty:
df.columns = df.iloc[0]
df = df[1:]
if 'Storage Location' in df.columns:
location = df.iloc[0]["Storage Location"]
for check_location in check_locations:
if check_location in location:
print(catalog_name)
print(f"\t{location}")
#######################
# schemas #
#######################
query = f"SHOW SCHEMAS IN {catalog_name}"
result = spark.sql(query)
for row in result.collect():
schema_name = row["databaseName"]
query = f"DESCRIBE SCHEMA {catalog_name}.{schema_name}"
result = spark.sql(query)
df = pd.DataFrame(result.collect()).transpose()
if not df.empty:
df.columns = df.iloc[0]
df = df[1:]
if 'Location' in df.columns:
location = df.iloc[0]["Location"]
for check_location in check_locations:
if check_location in location:
print(catalog_name)
print(f'\t{schema_name}')
print(f"\t\t{location}")
#######################
# tables #
#######################
if schema_name != 'information_schema':
query = f"SHOW TABLES IN {catalog_name}.{schema_name}"
result = spark.sql(query)
for row in result.collect():
table_name = row["tableName"]
query = f"DESCRIBE detail {catalog_name}.{schema_name}.{table_name}"
result = spark.sql(query)
for row in spark.sql(query).collect():
print
location = row["location"]
for check_location in check_locations:
if check_location in location:
print(catalog_name)
print(f'\t{schema_name}')
print(f'\t\t{table_name}')
print(f"\t\t\t{location}")