Search code examples
azuredatabricksazure-databricksdelta-lake

I receive the error "Cannot time travel Delta table to version X" whereas I can see the version X when looking at the history on Azure Databricks


I have a table in delta lake which has these tblproperties: enter image description here

I'm trying to access a version which was there last month, the 322.

When I look at the history, I can see it: enter image description here

But when I try to access it with such a command:

spark.read.format("delta").option("versionAsOf", 322).load(path)

I receive this error:

AnalysisException: Cannot time travel Delta table to version 322. Available versions: [330, 341].;

I can't understand the problem. I'm using Azure Databricks.


Solution

  • I'm not sure to understand this bug. There's an open pull request in DeltaLake that might solve the problem: https://github.com/delta-io/delta/pull/627.

    Till then, a person from Databricks gave me a workaround: set delta.checkpointRetentionDuration to X days. That will keep your checkpoints enough longer to have access to older versions.

    Then, you must launch something like that in your delta table:

    spark.sql(        f"""
            ALTER TABLE delta.`path`
                SET TBLPROPERTIES (
                    delta.logRetentionDuration = 'interval X days',
                    delta.deletedFileRetentionDuration = 'interval X days',
                    delta.checkpointRetentionDuration = 'X days'
                )
            """
    )
    

    It will keep your versions until X days.