Search code examples
amazon-web-servicesamazon-redshiftdbt

redshift column encoding with dbt


I am new to using dbt and having trying it out on aws redshift

Currently I can set the encoding of a column using a create statement outside of dbt as

create table fact_sales (
  id integer,
  date date NOT NULL encode az64...
)

via dbt I am able to control the data type of the project as

select
  id::integer,
  date::date
FROM stg.sales

Is there a way to set the encode az64 via dbt ?


Solution

  • I was able to solve this with the following strategy

    1. define the data type of the column of the model in the CTAS sql query
      # model.sql
       WITH 
        /* transform steps */
       result as (
         /* cast your projections explicitly */
         SELECT
          id::integer,
          date::date
         FROM _intermediate_step_table;
       ),
    
        SELECT * FROM final
    
    
    1. define the encoding of the column as a ALTER COLUMN as part of the post_hooks query

    Reference : https://docs.getdbt.com/reference/resource-configs/pre-hook-post-hook

    PROS :

    • since this is not directly supported via dbt schema.yml as explained in the discussion in the github issue thread this is the only sane way I could do this, without jumping hoops

    CONS :

    • if this is large table you would want to avoid this as ALTERing a large table could have performance issues.

    ALTERNATE:

    • if your model table IS large (billions of rows) you would be anyways using incremental approach of loading the table (I was not running into this due to the size of data I was handling)

    In this case

    • create the table outside the dbt life cycle.
    • use incremental mode to load the table (this would be done anyways as the table is huge and you want performance) which wont recreate the table