I wanted to create a Lakehouse using dbt
with spark
as it's engine. As a first step I want to read some raw files, e.g. json files, and write them as delta
or iceberg
table. But it seems like dbt-spark
is not supporting that. Did I miss something or is this really not possible? If not, how can one ingest raw files and write again as table.
I saw that dbt-duckdb
is supporting that behaviour and it works, but sadly they do not support these external table formats. I just want to avoid creating single spark jobs for ingesting the data first. I would like to do everything with dbt
.
You're correct that dbt-spark currently doesn't directly support reading raw files and writing to Delta or Iceberg tables within dbt models. However, you can try below approache
you can also check experimental dbt-external-tables plugin (https://github.com/dbt-labs/dbt-external-tables