Search code examples
databricksazure-databricksspark-structured-streamingdelta-live-tablesdatabricks-unity-catalog

DLT - Views v Materialized Views syntax and how to declare?


I'm creating a DLT pipeline using the medallion architecture. In Silver, I used CDC/SCD1 to take the latest id by date which is working fine but I had a question on the @dlt.view wrapper.

My current pipeline looks like this:

BRONZE

dlt.create_table(xxx)
def bronze_table():
     return(spark.readStream.transform(transformation_function))

SILVER Here as per CDC documentation, I need to create a view as CDC is not supported for streaming tables: https://docs.databricks.com/en/delta-live-tables/cdc.html

@dlt.view
df view():
return dlt.readStream("bronze_table)
        
dlt.create_streaming_table("target")
        
dlt.apply_changes(
        xyz
    )

My question is, is the view I'm creating a static view or a materialized view? In the DLT Pipeline UI it says it's just a view. However, I want this to be a materialized view as I want latency to be as fast as possible and make use of Delta Live Tables wherever it can to optimize latency.

If I am creating just a static view - what syntax do I need to apply to create a materialized view instead? I tried dlt.table instead but that just creates a streaming table. Many thanks


Solution

  • In Python, Delta Live Tables determines whether to update a dataset as a materialized view or streaming table based on the defining query.

    The @table decorator is used to define both materialized views and streaming tables.

    To define a materialized view in Python, apply @table to a query that performs a static read against a data source.

    To define a streaming table, apply @table to a query that performs a streaming read against a data source.

    Read more here

    https://docs.databricks.com/en/delta-live-tables/python-ref.html#import-the-dlt-python-module