Search code examples
regexpostgresqlmacrosamazon-redshiftdbt

change column names using a dbt macro


Can I use a dbt macro to modify the column names in AWS redshift?

Example changes to be made:

"checkedAt" --> checked_at
"helloWorld" --> hello_world
"testKitchenNew" --> test_kitchen_new

I tried to write a sample macro like this:

{% macro modify_col_names(column_name) %}
  {% set column_name = column_name | replace('A', '_a') %}
  {% set column_name = column_name | replace('K', '_k') %}
  {% set column_name = column_name | replace('N', '_n') %}
  {% set column_name = column_name | replace('W', '_w') %}
  {{ column_name | lower }}
{% endmacro %}

Now I am trying to use it like this:

with raw_data as (
    select
        *
    from
        {{ source('src', 'TableName') }}
),
transformed_data as (
    select
        {{ modify_col_names(['checkedAt', 'helloWorld', 'testKitchenNew']) }}
    from
        raw_data
)
select
    *
from
    transformed_data

But this won't actually modify the column names and would throw an error like this:

Database Error in model xx (models/test.sql)
10:32:24    syntax error at or near "'checked_at'"

How can I actually modify the column names?

P.S: I would need to do it for all columns of the table so don't necessarily need to specify the column names manually.


Solution

  • To convert column names from CamelCase to snake_case you can use:

    1. get_columns_in_relation from dbt adapter
    2. regex python module available in dbt

    convert_column_names.sql macro:

    {% macro convert_column_names(relation) %}
    
        {% set columns = adapter.get_columns_in_relation(relation) %}
        {% set re = modules.re %}
    
        {% set query  %}
        SELECT
        {% for column in columns %}
            "{{ column.column }}" AS {{ re.sub("(?<!^)([A-Z][a-z]|(?<=[a-z])[^a-z]|(?<=[A-Z])[0-9_])", "_\g<1>", column.column).lower() }}
        {%- if not loop.last %},{% endif -%}
        {% endfor %}
        FROM {{ relation }}
        {% endset %}
    
        {{ return(query) }}
    
    {% endmacro %}
    

    Usage in model (you should either use source() or ref():

    {{ convert_column_names(source('src', 'TableName')) }}
    

    Note, that some databases (for example, Snowflake) automatically convert column names to upper case. In that case this approach would not work.