Search code examples
jinja2dbt

dbt & Jinja: How can I keep the raw string when setting a variable?


How can I keep the raw text when setting a variable within a macro in dbt?

I have tried |e but when I log to check, it doesn't seem to work.

My macro code:

{% macro process(column_name) %}
    {% set my_dict = {"\\T"|e:" ","\\\\T"|e:" "} %}
    {% for key, value in my_dict.items() %}
            {{ log(key, True) }}
    {% endfor %}
    {{ return('') }}
{% endmacro %}

The log output is '\T' and '\\T' and I'm expecting '\\T' and '\\\\T'. I'm looking for something similar to r'' when setting the variable.


Solution

  • In Python, raw string literals impact how the interpreter parses the characters you wrap in quotes. Once it's in memory, there is no difference between a string that began its life as a raw string literal and one that began as an ordinary string literal. Try this in a Python interpreter:

    >>> raw = r"\T"
    >>> print(raw)
    \T
    >>> raw
    '\\T'
    >>> ordinary = "\\T"
    >>> print(ordinary)
    \T
    >>> ordinary
    '\\T'
    

    As you probably found out, jinja doesn't parse raw string literals. It does have a {% raw %} tag, but that is for escaping jinja syntax. The escape (or e) filter is for HTML escapes (so it'll replace spaces with %20, for example) and doesn't impact Python string literal parsing.

    Using a {% set %} block will do what you want, and will not escape the backslashes:

    {% macro raw_string() %}
    {% set key -%}
    \\T
    {%- endset %}
    {{ log(key, True) }}
    {% endmacro %}
    -- dbt run-operation raw_string
    -- \\T
    

    But that's going to be tedious if you have a lot of keys to deal with (You'll want a set block for each key, and then you'll have to define the dict separately, using the variables you defined in each set block).

    Ultimately, this is a data entry challenge for you. Through that lens, you may just want to create a string that is valid JSON or YAML; recent version of dbt include fromjson and fromyaml methods in the jinja context. Thankfully, YAML is fine with backslashes in dictionary keys, and doesn't do any escaping:

    {% macro process() %}
    
    {% set data -%}
    \\T: val
    \\\\T: val
    {%- endset %}
    
    {% set my_dict = fromyaml(data) %}
    
    {% for key, value in my_dict.items() %}
        {{ log(key, True) }}
    {% endfor %}
    
    {% endmacro %}
    -- dbt run-operation process
    -- \\T
    -- \\\\T