I am trying to compare some columns in an automated way, the column types that i am comparing can be: float, integer, string or date / timestamps but I am having issues with the syntax and how to best cover the last condition i.e when I have dates/timestamps
This is what I have so far but somehow i am getting type errors. Plus I am not sure if what i have done would work or not since i am stuck at the very initial stage.
{% macro find_mismatch_s1_s2_(s1_col, s2_col,field) -%}
{%if type(s1_col) is string %}
if(coalesce({{s1_col}},"") != coalesce({{s2_col}},""),true,false ) as is_{{field}}_mismatch
{%- elif type(s1_col) is float or type(s1_col) is integer -%}
if(coalesce({{s1_col}},0) != coalesce({{s2_col}},0),true,false) as is_{{field}}_mismatch
{% else %}
if(coalesce({{s1_col}},{{another_macro_that_gets_a_date}}) != coalesce({{s2_col}},{{another_macro_that_gets_a_date}}),true,false) as is_{{field}}_mismatch
{% endif %}
{% endmacro %}
coalesce
statements is because if any of the value in one of the columns is null, the is_{{field}}_mismatch
would return false, even though it should be truecoalesce
for the second and third cases, it will run into error due to column type, hence I have to add these in seperate if statements.The data in your database does not flow through the code you write in Jinja -- jinja templates a sql query before that query is executed.
When you write type(s1_col)
, you might expect to get the data type of the data in the database column named s1_col
, but actually you will always get a string
type, since the variable s1_col
is a string that holds the name of a database column.
You can write this logic in a type-agnostic way by returning true if either column is null (in sql, not jinja):
{% macro find_mismatch_s1_s2_(s1_col, s2_col,field) -%}
case
when {{ s1_col }} is null and {{ s2_col }} is not null
then true
when {{ s2_col }} is null and {{ s1_col }} is not null
then true
when {{ s1_col }} is null and {{ s2_col }} is null
then false
else {{ s1_col }} != {{ s2_col }}
end as is_{{field}}_mismatch
{% endmacro %}
Another option, if you really need the data type, is to use dbt's adapter
class, specifically the get_columns_in_relation
method. The returned List of Columns have a data_type
property. See this answer for more info.