I'm tying to write delta formatted table to ADLS Gen2 from Azure Synapse Pyspark notebook with Serverless SQL pool.
It is throwing me error, while writing to ADLS Gen2 as below.
Py4JJavaError: An error occurred while calling o3898.save.
: Operation failed: "An HTTP header that's mandatory for this request is not specified.", 400, PUT,
Below is the code which I am using to write to ADLS.
if (DeltaTable.isDeltaTable(spark, source_path)):
print('Existing delta table')
# Read the existing Delta Table
delta_table = DeltaTable.forPath(spark, source_path)
# Merge new data into existing table
delta_table.alias("existing").merge(
source = df_eventLog.alias("updates"),
condition = " AND ".join(conditions_list)
).whenMatchedUpdateAll(
).whenNotMatchedInsertAll(
).execute()
else:
print('New delta table')
# Create new delta table with new data
df_eventLog.write.format('delta').save(source_path)
In my case, Delta table isn't available initially, So, else part is running. df_eventLog is loading fine without errors.
Can someone help me where I am going wrong?
I tried to replicate the same issue with a delta table in my lake database. I'm trying to write it to ADLS storage account and loaded the delta table into data frame using below code:
df_delta = spark.read.format("delta").table("<database>.<tableNmae>")
When I am writing the data frame into ADLS using below code, I got the same error as can be seen in the screenshot below:
delta_table_path = "abfss://<containerName>@<ADLSName>.blob.core.windows.net/"df_delta.write.format("delta").mode("overwrite").save(delta_table_path)
As per this MS Document to address the ADLS storage account The URL should be in abfs[s]://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file_name>
format. According to that I have modified the delta_table_path
as mentioned below:
delta_table_path = "abfss://<containerName>@<ADLSName>.dfs.core.windows.net/<filepath>"
I tried again to write the data with above URL format. It wrote to the specified path successfully without any error.