Search code examples
pysparkapache-spark-sqldatabricksazure-databricksazure-data-lake-gen2

How can we repair a Delta Location file in ADLS Gen 2


I am doing a truncate and Load of a delta File in ADLS Gen2 using Dataflows in ADF. After the successful run of Pipeline if I am trying to read the file in Azure Data Bricks i am Getting the below error.

A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table DELETE statement. For more information,

One way which I found to eliminate this is restart the cluster in ADB. But, is there a better way to overcome this issue?


Solution

  • Sometimes changes in table partitions/columns will not be picked by hive megastore, refresh the table is always a good practice before you trying to do some queries. This exception can occur if the metadata picked up from the current job is altered from any other job while this job still running.

    Refresh Table: Invalidates the cached entries, which include data and metadata of the given table or view. The invalidated cache is populated in a lazy manner when the cached table or the query associated with it is executed again.

    %sql
    REFRESH [TABLE] table_identifier
    

    OR

    Here are some recommendations to resolve this issue:

    • Add the configuration either on cluster label (spark.databricks.io.cache.enabled false) or in first command of master notebook using spark.conf.set("spark.databricks.io.cache.enabled", "false")
    • Add the "sqlContext.clearCache()" after the delete operation.
    • Add the "FSCK REPAIR TABLE [db_name.]table_name" after delete operation.