I am doing a truncate and Load of a delta File in ADLS Gen2 using Dataflows in ADF. After the successful run of Pipeline if I am trying to read the file in Azure Data Bricks i am Getting the below error.
A file referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table DELETE
statement. For more information,
One way which I found to eliminate this is restart the cluster in ADB. But, is there a better way to overcome this issue?
Sometimes changes in table partitions/columns will not be picked by hive megastore, refresh the table is always a good practice before you trying to do some queries. This exception can occur if the metadata picked up from the current job is altered from any other job while this job still running.
Refresh Table: Invalidates the cached entries, which include data and metadata of the given table or view. The invalidated cache is populated in a lazy manner when the cached table or the query associated with it is executed again.
%sql
REFRESH [TABLE] table_identifier
OR
Here are some recommendations to resolve this issue:
spark.databricks.io.cache.enabled false
) or in first command of master notebook using spark.conf.set("spark.databricks.io.cache.enabled", "false")
sqlContext.clearCache()
" after the delete operation.FSCK REPAIR TABLE [db_name.]table_name
" after delete operation.