Search code examples
pysparkdatabricksdelta-lakedelta-live-tables

How to obtain a direct way to differentiate between a full refresh and an incremental update for Delta live table?


I have a tables that travels from Bronze - silver - gold,

I want to implement some function like 'is_full_refresh()' so the pipeline filters the df depending on the output, if it's a full, don't filter, if it's incremental filter by a,b,c

Checking the documentation on Databricks https://docs.databricks.com/delta-live-tables/settings.html#cluster-config can't find a direct way to differentiate between a full refresh and an incremental,

How can I do that?


Solution

  • One option would be to rely on REST API call https://docs.databricks.com/api/workspace/pipelines/getupdate.

    Another option I would try (requires some investigation) is to query DLT event logs: https://docs.databricks.com/delta-live-tables/observability.html. I guess some log events may have this info.