Search code examples
mergepartitioningdelta-lake

What happens if you update the column of a Delta table by which it is partitioned?


What happens if you update the column of a Delta table by which it is partitioned? Does it degrade Write performance substantially? I am trying to find out which I haven't been able to so far from the docs whether lets say if we have underlying parquet, does Delta rebuild new files without the updated rows for the existing partitions OR is it virtually handled through transaction log entries?


Solution

  • You can always get this information from the history. For example, here is the data from operationsMetric column after execution of the update operation on the partition column. As you see, it rewrites files:

    {
      "numRemovedFiles": "5", 
      "numCopiedRows": "0", 
      "numAddedChangeFiles": "0", 
      "executionTimeMs": "478", 
      "scanTimeMs": "34", 
      "numAddedFiles": "5", 
      "numUpdatedRows": "5", 
      "rewriteTimeMs": "444"
    }
    

    and if you check file names, then you see that they are different.