Search code examples
google-bigquerymacrosjinja2dbt

dbt macro using Jinja in BigQuery


I am trying to compare some columns in an automated way, the column types that i am comparing can be: float, integer, string or date / timestamps but I am having issues with the syntax and how to best cover the last condition i.e when I have dates/timestamps

This is what I have so far but somehow i am getting type errors. Plus I am not sure if what i have done would work or not since i am stuck at the very initial stage.

{% macro find_mismatch_s1_s2_(s1_col, s2_col,field) -%}
{%if type(s1_col) is string %}
  if(coalesce({{s1_col}},"") != coalesce({{s2_col}},""),true,false ) as is_{{field}}_mismatch
{%- elif type(s1_col) is float or type(s1_col) is integer -%}
  if(coalesce({{s1_col}},0) != coalesce({{s2_col}},0),true,false) as is_{{field}}_mismatch
{% else %}
  if(coalesce({{s1_col}},{{another_macro_that_gets_a_date}}) != coalesce({{s2_col}},{{another_macro_that_gets_a_date}}),true,false) as is_{{field}}_mismatch
{% endif %}
{% endmacro %}
  • First I am checking if the column type is string if yes, then compare the values of two columns, if values are null then use blank to compare
  • Second I am checking if the column types are float or integer, if yes compare two columns and if there is null then compare with 0
  • lastly, I am covering the case of dates/timestamps, I couldn't find the type() for the date / timestamp so I added it to the else statement, which might not be the best thing, would be great if someone can also recommend how to best cover these cases.
  • The reason I am adding the coalesce statements is because if any of the value in one of the columns is null, the is_{{field}}_mismatch would return false, even though it should be true
  • if I enter empty strings as a part of the coalesce for the second and third cases, it will run into error due to column type, hence I have to add these in seperate if statements.

Solution

  • The data in your database does not flow through the code you write in Jinja -- jinja templates a sql query before that query is executed.

    When you write type(s1_col), you might expect to get the data type of the data in the database column named s1_col, but actually you will always get a string type, since the variable s1_col is a string that holds the name of a database column.

    You can write this logic in a type-agnostic way by returning true if either column is null (in sql, not jinja):

    {% macro find_mismatch_s1_s2_(s1_col, s2_col,field) -%}
    case
        when {{ s1_col }} is null and {{ s2_col }} is not null
        then true
        when {{ s2_col }} is null and {{ s1_col }} is not null
        then true
        when {{ s1_col }} is null and {{ s2_col }} is null
        then false
        else {{ s1_col }} != {{ s2_col }}
    end as is_{{field}}_mismatch
    {% endmacro %}
    

    Another option, if you really need the data type, is to use dbt's adapter class, specifically the get_columns_in_relation method. The returned List of Columns have a data_type property. See this answer for more info.