Search code examples
jinja2dbt

run_query using dbt and jinja


I need to create a macro on dbt.

I have systematically 2 distinct values (in date format) in the date_export column, the date of the old export and the date of the new export, I have to compare the number of rows in the two date_export and take only the one with several rows, if the number of rows are equal I take the data with date_export = Min(date_export)

Here is my code, but I have an error "'nb_rows_previous' is undefined" when I try to run the model, anyone have an idea where I screwed up with my code?

{% set nb_rows_previous_data_query %}
    select COUNT(*) from {{source(s, table)}}
    WHERE date_export = (SELECT MIN(date_export)) FROM {{source(s, table)}} )
{% endset %}

{% set nb_rows_new_data_query %}
    select COUNT(*) from {{source(s, table)}}
    WHERE date_export = (SELECT MAX(date_export) FROM {{source(s, table)}} )
{% endset %}

{% set results_previous_data = run_query(nb_rows_previous_data_query) %}

{% set results_new_data = run_query(nb_rows_new_data_query) %}

{% if execute %}
    {% set nb_rows_previous = results_previous_data.columns[0][0] %}
    {% set nb_rows_new = results_new_data.columns[0][0] %} 
{% endif %}

{% if nb_rows_previous >= nb_rows_new %}

    SELECT
        {{schema}}
    FROM {{source(s, table)}}
    WHERE date_export = (SELECT MIN(date_export) FROM {{source(s, table)}} )

{% else %}

    SELECT
        {{schema}}
    FROM {{source(s, table)}}
    WHERE date_export = (SELECT MAX(date_export) FROM {{source(s, table)}} )

Solution

  • In order to persist the value in both nb_rows_previous and nb_rows_new, you will need to wrap the whole last part of the macro with the {% if execute %}, so that it looks like this:

    (...)
    
    {% if execute %}
    
      {% set nb_rows_previous = run_query(nb_rows_previous_data_query).columns[0][0] %}
      {% set nb_rows_new = run_query(nb_rows_previous_data_query).columns[0][0] %} 
    
      {% if nb_rows_previous >= nb_rows_new %}
    
        SELECT
          {{schema}}
        FROM {{source(s, table)}}
        WHERE date_export = (SELECT MIN(date_export) FROM {{source(s, table)}} )
    
      {% else %}
    
        SELECT
          {{schema}}
        FROM {{source(s, table)}}
        WHERE date_export = (SELECT MAX(date_export) FROM {{source(s, table)}} )
    
      {% endif %}
    
    {% endif %}
    

    By the way, I also changed the code a little bit by doing the run_query command and the value selection in the same line of code