We have DBT models which we use to run on AWS Athena tables. It creates Hive external tables behind the scenes. Now we have a situation where the data type of column may change in future. Athena tables based on Hive does not allow to change data type of a column, but Apache Iceberg table does. We can change data type of a column in Apache Iceberg tables.
We copied the data from old Hive table to Iceberg table but when we run the DBT model, it below error:
[error] [MainThread]: An error occurred (InvalidInputException) when calling the GetPartitions operation:
The DBT configuration for the model is as follows. It used to work with Hive external tables but not working with Apache Iceberg tables.
{{
config(materialized='incremental',
external_location="s3://" + env_var('BUCKET-NAME') + "/" + env_var('SCHEMA-NAME') + "/" + this.identifier,
partitioned_by = ['event_date'],
incremental_strategy='insert_overwrite',
on_schema_change='ignore'
)
}}
The Apache Iceberg is created as below:
CREATE TABLE iceberg_table (
id int,
data string,
event_date string)
PARTITIONED BY (event_date)
LOCATION 's3://DOC-EXAMPLE-BUCKET/iceberg-folder'
TBLPROPERTIES (
'table_type'='ICEBERG',
'format'='parquet',
'write_target_data_file_size_bytes'='536870912',
'optimize_rewrite_delete_file_threshold'='10'
)
DBT with Athena does support Apache Iceberg. Your config chunk of code would look like this:
{{
config(
schema = env_var('DATABASE'),
s3_data_dir='s3://' ~ env_var('BUCKET') ~ '/',
s3_data_naming='table_unique',
format='parquet',
write_compression='GZIP',
materialized='incremental',
table_type='iceberg',
incremental_strategy = 'merge',
unique_key = ['key1', 'key2],
tags=["insert_tags"]
)
}}
Then running your DBT model should create the Iceberg table. In this case it will follow an upsert strategy, updating the values with changes or inserting new ones based on the unique_key field.