Search code examples
sqlgoogle-bigquerymacrosdbt

dbt macro: How to join tables on multiple columns in a loop


I'm writing a dbt model to join two tables on multiple columns, compiled to something like this:

SELECT 
  A.col1,
  A.col2,
  A.col3,
FROM
   A
LEFT JOIN
   B
ON
   (A.col1 = B.col1 OR (IS_NAN(A.col1) AND IS_NAN(B.col1))
   AND (A.col2 = B.col2 OR (IS_NAN(A.col2) AND IS_NAN(B.col2))
   AND (A.col3 = B.col3 OR (IS_NAN(A.col3) AND IS_NAN(B.col3))

and this logic will be applied to many table pairs, so I need a macro. The joining logic is the same on all columns, so a loop over columns in the ON clause would be perfect, like this

SELECT 
  {% for col in all_cols %}
    A.{{ col }},
  {% endfor %}
FROM
   A
LEFT JOIN
   B
ON
    {% for col in all_cols %}
        (A.{{col}} = B.{{col}} OR (IS_NAN(A.{{col}}) AND IS_NAN(B.{{col}})),
       <-- What to put here for AND the next condition???
    {% endfor %}

How can I concatenate the conditions in ON clause with AND when iterating over columns?


Solution

  • The cute way (add a predicate that is always true, so you can start every statement with AND):

    SELECT 
      {% for col in all_cols %}
        A.{{ col }},
      {% endfor %}
    FROM
       A
    LEFT JOIN
       B
    ON
        1=1
        {% for col in all_cols %}
            AND (A.{{col}} = B.{{col}} OR (IS_NAN(A.{{col}}) AND IS_NAN(B.{{col}})))
        {% endfor %}
    

    The less-cute way, using loop.first (loop is a variable set by jinja inside a for loop that has some handy properties. loop.first and loop.last are especially useful):

    SELECT 
      {% for col in all_cols %}
        A.{{ col }},
      {% endfor %}
    FROM
       A
    LEFT JOIN
       B
    ON
        {% for col in all_cols %}
            {% if not loop.first %}AND{% endif %} (A.{{col}} = B.{{col}} OR (IS_NAN(A.{{col}}) AND IS_NAN(B.{{col}})))
        {% endfor %}