Search code examples
apache-sparkdbt

Does dbt support repartitionByRange, partitionBy, bucketBy, sortBy if I plan to write data with spark over dbt?


I need data transformation write in order to optimize for later reads. I planned to do this with pyspark with

.repartitionByRange(max_partitions, ..., rand())
.bucketBy(numBuckets, ...)
.sortBy(...)
.option("maxRecordsPerFile", 1000000)

As this is just a transformations I thought this could be a good use case for me to try dbt

I never used dbt - question would I be able to achieve the same with dbt over spark if i'm not the admin of the dbt instance and can only write queries over it on top of spark connector?

Thanks


Solution

  • The dbt-spark adapter currently supports partition_by, cluster_by, and buckets in the model config, which are the same options offered in SparkSQL's CREATE TABLE statement.