Hello everyone, I have 28 tables(models) to create. I need to do some transformations on these tables. So I decided to use macro not to repeat myself and also will have more transformations in the future. I want to remove '%', '$' signs from the column and cast it to float. When is '%' divide it by 100.
I've created this model:
SELECT
{{ clean_values('"Data"') }} AS "Data"
FROM
{{ source('mml_staging_eastor', 'DATAXIS_Development_indicators') }}
I also created macro:
{% macro clean_values(value_column) -%}
{% do log(node, info=true) %}
{# {% set column_value = 'wme%mt' %} #}
{% set column_value = value_column %}
{% set col_val_list = value_column | list %}
{% if '%' in col_val_list %}
'{{ column_value | replace('%', '') }}'
{% elif '$' in col_val_list %}
'{{ column_value | replace('$', '') }}'
{% else %}
{{ col_val_list }}
{% endif %}
{%- endmacro %}
Macros are compiled (templated) before the query is run. That means that the data doesn't run through the jinja templater. When you {% set column_value = value_column %} you're just passing a string with the value value_column into jinja, not the data from the field with that name.
Which is true and I'll get result in my model like: [ """, "D", "a", "t", "a", """ ].
It's possible to use the run_query macro to pull data into the jinja context, but this is slow and error-prone.
*If I'm using commented line, so dedicating string 'wme%mt' to variable it's working fine.
How can I handle this to clean this column in macro?
Thank you in advance!*
I see you found my other answer that attempts to explain the difference between the compilation step and the query execution step. But I think you're making the same mistake as that other person, and you're using jinja to operate on the string that represents the name of the column, instead of using SQL to operate on the data in the column.
Your macro needs to write SQL that can be executed in the database. I can't quite tell from your question if the $
and %
are in the data or the column names, but assuming the former (and that you're on Snowflake or similar dialect), your macro would look like this (note that %
is a SQL wildcard character meaning "match any number of any char" and must be escaped):
{% macro clean_values(column_name) -%}
case
when {{ column_name }} like '%^%' escape '^' -- number ending with %
then replace({{ column_name }}, '%', '')::float / 100.0
when {{ column_name }} like '$%' -- number starting with $
then replace({{ column_name }}, '$', '')::float
else {{ column_name }}::float
end
{% endmacro %}
I'm using the simple cast operator, ::
here, which will throw an exception if it can't cast the expression to a float. You may prefer to use TRY_TO_DOUBLE
, which will return NULL
instead of throwing an exception, if the column contains any values Snowflake can't cast.