I have a table in delta lake which has these tblproperties:
I'm trying to access a version which was there last month, the 322.
When I look at the history, I can see it:
But when I try to access it with such a command:
spark.read.format("delta").option("versionAsOf", 322).load(path)
I receive this error:
AnalysisException: Cannot time travel Delta table to version 322. Available versions: [330, 341].;
I can't understand the problem. I'm using Azure Databricks.
I'm not sure to understand this bug. There's an open pull request in DeltaLake that might solve the problem: https://github.com/delta-io/delta/pull/627.
Till then, a person from Databricks gave me a workaround: set delta.checkpointRetentionDuration to X days. That will keep your checkpoints enough longer to have access to older versions.
Then, you must launch something like that in your delta table:
spark.sql( f"""
ALTER TABLE delta.`path`
SET TBLPROPERTIES (
delta.logRetentionDuration = 'interval X days',
delta.deletedFileRetentionDuration = 'interval X days',
delta.checkpointRetentionDuration = 'X days'
)
"""
)
It will keep your versions until X days.