Search code examples
azureazure-databricksdelta-lake

Writing to UserMetadata Field During Table Creation


When I run the following code in my DataBricks environment, the initial save does not write "Initial Commit" to the userMetaData field; however, the two following "append" operations respectively write "Added James Brown" and "Added Joe Red, Jim Blue and Joe True" to userMetaData without issue:

# https://stackoverflow.com/questions/47674311/how-to-create-a-sample-single-column-spark-dataframe-in-python
df1 = sc.parallelize([["Brown", "John"], ["Green", "John"]]).toDF(("LastName", "FirstName"))
df2 = sc.parallelize([["Brown", "James"]]).toDF(("LastName", "FirstName"))
df3 = sc.parallelize([["Red", "Joe"], ["Blue", "Jim"], ["True", "Joe"]]).toDF(("LastName", "FirstName"))


# https://bigdataprogrammers.com/write-dataframe-to-delta-table-in-databricks-with-append-mode/
# https://docs.databricks.com/en/delta/custom-metadata.html#language-python
tableName = "myCatalog.mySchema.metaDataTest"
df1.write.format("delta").option("userMetadata", "Initial Commit").saveAsTable(tableName)
df2.write.mode("append").format("delta").option("userMetadata", "Added James Brown").saveAsTable(tableName)
df3.write.mode("append").format("delta").option("userMetadata", "Added Joe Red, Jim Blue and Joe True").saveAsTable(tableName)

Why is my initial table creation via df1 not writing "Initial Commit" to the userMetadata field?

Select all Rows from metaDataTest Table describe history metaDataTest


Solution

  • If you want to set "Initial Commit" as the metadata for version 0, you should explicitly create the Delta table with that metadata before appending data.

    write to a Delta table by running the history command Know more about Retrieve Delta table history

    Try the below approach:

    df1.write.format("delta").mode("overwrite").option("userMetadata", "Initial Commit").save(tablePath)
    

    In the above code I have used the .mode ("overwrite") enter image description here

    enter image description here