Search code examples
azuredatabricksazure-synapsedelta-live-tables

Dynamic execution of a DLT pipeline for SCD2 creation in Databricks


I am trying to load my datalake SCD2 tables via Databricks DLT pipeline. My goal is to dynamically load the tables in a notebook by passing the required information in the "Configuration" setting in the DLT pipeline.

The "Configuration" setting could look like this:

{
    "object1": {
        "object_name": "table",
        "business_key_column": "table_bk",
        "except_column_list": "change_date",
        "sequence_column": "change_date"
    },
    "object2"...
}

enter image description here

Later, the target notebook should obtain the parameters in the pipeline configuration and load the tables accordingly. Can I obtain the pipeline configuration setting in the target notebook? If yes: How?


Solution

  • Yes. You can get the pipeline configuration setting in the notebook using spark configuration.

    As mentioned in this documentation the configuration setting are available in pipeline queries through spark configuration.

    enter image description here

    You can access using below statement spark.conf.get("key").

    Below is the configuration given. enter image description here

    And accessed in notebook.

    import dlt
    
    @dlt.table(
    comment="The raw wikipedia clickstream dataset, ingested from /databricks-datasets."
    )
    def clickstream_raw():
        json_path = spark.conf.get("json_path")
        return (spark.read.format("json").load(json_path))
    

    enter image description here

    Based on given path the dataframe is returned.

    In your case you can get it like below.

    import dlt
    
    @dlt.table()
    def object_list():
        import json
        
        json_obj = json.loads(spark.conf.get("object_list"))
        data = [json.dumps(json_obj[i]) for i in json_obj]
        json_df = spark.read.json(spark.sparkContext.parallelize(data))
        return json_df
    

    After getting json object you can use it accordingly.

    Output: enter image description here

    and

    enter image description here

    and

    enter image description here