Azure Databricks creating random folders while writing and merging.
I run the below query in databricks:
df.write.format('delta').mode('overwrite').save("abfss://[email protected]/some_path/events")
When I am checking the azure storage UI, I am seeing some folders:
What are these folder xJ? And why it is getting created?
The description of the query:
engineInfo: Databricks-Runtime/13.2.x-scala2.12
isolationLevel: WriteSerializable
I have tried both Local and Global spark sessions and perfromed the write and Merge operations to ADLS:
Below is the Global spark session:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
data = [('Alice', 34), ('Bob', 55), ('Charlie', 45)]
columns = ['name', 'age']
df = spark.createDataFrame(data, columns)
df.write.format('delta').mode('overwrite').save("abfss://[email protected]/1_path/events")
new_data = [('Dave', 28), ('Eva', 38)]
new_df = spark.createDataFrame(new_data, columns)
new_df.write.format('delta').mode('append').save("abfss://[email protected]/1_path/events")
By following this Global spark session approach, you can ensure that the operations are consistent and that you avoid any unexpected folder creation or naming issues during the write and merge operations in Azure Databricks using Delta Lake.