Search code examples
sqljinja2snowflake-cloud-data-platformdbt

Run select query inside another CTE in dbt + jinja


I have a query in dbt (+ Snowflake) that looks like this:

with bank_str_table as (
    select *
    from {{ ref('bank_str_dbt') }}
),

first_table as (
    select
        bk.id,
        bk.metadata$row_id,
        bk.label,
        ts_del.merge_key,
    from bank_str_table bk
    inner join bank_str_table ts_del on bk.METADATA$ROW_ID = ts_del.METADATA$ROW_ID
    where
        ts_del.METADATA$ACTION = 'DELETE' AND
        ts_del.METADATA$ISUPDATE = 'TRUE' AND
        bk.METADATA$ACTION = 'INSERT' AND
        bk.METADATA$ISUPDATE = 'TRUE'
),

bank_str_table_first as (

    select *
    from bank_str_table
    where label not in (**select listagg(label, ', ') from first_table**)

    union

    select *
    from first_table
)

select *
from first_table

My problem is with this statement inside the CTE bank_str_table_first: select listagg(label, ', ') from first_table. I need to execute it instead of adding it as a raw SQL inside my WHERE statement.

I faced a similar problem once but could resolve it with a jinja at the beginning of my code. However, It's a massive query, and I need to take the output from CTEs and run it dynamically (it will happen three more times).

Just as a reference, this was the jinja code I used:

{%- set get_ids_from_update -%}

    INSERT_YOU_SQL_STATEMENT_HERE

{% endset %}
{% set results = run_query(get_ids_from_update) %}

{% if execute %}
{# Return the first column #}
{% set bank_list = results.columns[0].values()[0] %}
{% endif %}

However, I could not find out how to 1. Reference a table from a CTE inside this jinja code and 2. Execute this code in the middle of the script instead of in the beginning. Any ideas?


Solution

  • The query:

     select *
     from bank_str_table
     where label not in (**select listagg(label, ', ') from first_table**)
    

    Could be rewritten as:

    select *
    from bank_str_table AS b
    left join first_table AS f
      on b.label = f.label
    where f.label is null
    

    to effectively remove rows from bank_str_table that have label from first_table.


    Seconnd NOT IN could work with subquery itself, so there is no need to generate list of string:

     select *
     from bank_str_table AS b
     where b.label not in (select f.label 
                           from first_table AS f 
                           where f.label is not null)