I'm writing a dbt model to join two tables on multiple columns, compiled to something like this:
SELECT
A.col1,
A.col2,
A.col3,
FROM
A
LEFT JOIN
B
ON
(A.col1 = B.col1 OR (IS_NAN(A.col1) AND IS_NAN(B.col1))
AND (A.col2 = B.col2 OR (IS_NAN(A.col2) AND IS_NAN(B.col2))
AND (A.col3 = B.col3 OR (IS_NAN(A.col3) AND IS_NAN(B.col3))
and this logic will be applied to many table pairs, so I need a macro. The joining logic is the same on all columns, so a loop over columns in the ON clause would be perfect, like this
SELECT
{% for col in all_cols %}
A.{{ col }},
{% endfor %}
FROM
A
LEFT JOIN
B
ON
{% for col in all_cols %}
(A.{{col}} = B.{{col}} OR (IS_NAN(A.{{col}}) AND IS_NAN(B.{{col}})),
<-- What to put here for AND the next condition???
{% endfor %}
How can I concatenate the conditions in ON clause with AND
when iterating over columns?
The cute way (add a predicate that is always true, so you can start every statement with AND
):
SELECT
{% for col in all_cols %}
A.{{ col }},
{% endfor %}
FROM
A
LEFT JOIN
B
ON
1=1
{% for col in all_cols %}
AND (A.{{col}} = B.{{col}} OR (IS_NAN(A.{{col}}) AND IS_NAN(B.{{col}})))
{% endfor %}
The less-cute way, using loop.first
(loop
is a variable set by jinja inside a for loop that has some handy properties. loop.first
and loop.last
are especially useful):
SELECT
{% for col in all_cols %}
A.{{ col }},
{% endfor %}
FROM
A
LEFT JOIN
B
ON
{% for col in all_cols %}
{% if not loop.first %}AND{% endif %} (A.{{col}} = B.{{col}} OR (IS_NAN(A.{{col}}) AND IS_NAN(B.{{col}})))
{% endfor %}