I noticed that I have only 2 checkpoints files in a delta lake folder. Every 10 commits, a new checkpoint is created and the oldest one is removed.
For instance this morning, I had 2 checkpoints: 340 and 350. I was available to time travel from 340 to 359.
Now, after a "write" action, I have 2 checkpoints: 350 and 360. I'm now able to time travel from 350 to 360. What can remove the old checkpoints? How can I prevent that?
I'm using Azure Databricks 7.3 LTS ML.
If you want to keep your checkpoints X days, you can set delta.checkpointRetentionDuration to X days this way:
spark.sql(f"""
ALTER TABLE delta.`path`
SET TBLPROPERTIES (
delta.checkpointRetentionDuration = 'X days'
)
"""
)