Search code examples
azure-databricks

Adding new column field with value to existing delta table


  1. New column added with values from source side. On the landing side(sink) i have stage table which is receiving the new data.
  2. On the processing side i am merging the stage with process data(final table) with condition.
  3. How can i add new column to the existing final delta table from stage data.
  4. All process is currently running as job.

I know there should be merge schema for new field but not sure about detail steps. Can anyone guide with detailed step ?


Solution

  • I tried to reproduce the same in my environment and got below results:

    For demo , I created delta table in this location /mnt/defaultDatalake/KK1

    enter image description here

    Now, I performed the merge operation updated and added a new column field with the value into the existing delta table in that location using below code.

    from pyspark.sql.functions import lit
    
    spark.read.format("delta").load('/mnt/defaultDatalake/KK1')\
      .withColumn("Recovered", lit('USA'))\
      .write\
      .format("delta")\
      .mode("overwrite")\
      .option("overwriteSchema", "true")\
      .save('/mnt/defaultDatalake/KK1')
    

    enter image description here

    Now you can check , I got new column called Recovered.

    enter image description here