I have one dbt project that is deployed on one single GCP project. Within the GCP project, I have multiple datasets in BigQuery, which I want to run my query on.
So my compiled SQL code should look something like:
with
first_table as (
select * from projectA.dataset_X.table_thishasinfo
)
second_table as (
select * from projectA.dataset_Y.table_thishasmoreinfo
select *
from first_table ft
left join second_table st
on ft.id = st.id
Within dbt, these are of course references to the files in different models where they are created.
The query needs to write to a table proejctA.dataset_Z.table_thisistheoutput
I tried looking at target
within dbt, or defining it in profiles, but I can't figure out how to have three different datasets involved in the project.
How could I write my dbt code to create a query that can run on multiple datasets?
It seems that the issue is resolved by editing the dbt_project.yml
file.
If you include the following:
models:
<projectname>:
<modelname>:
database: projectA
schema: dataset_X
<modelname2>:
database: projectA
schema: dataset_Y
<modelname3>:
database: projectA
schema: dataset_Z
If you run all the models in the same production environment, let's say with a prefix dbt
, then in your BQ, you will see the following three datasets:
dbt_dataset_X
, dbt_dataset_Y
, `dbt_dataset_Z. And the queries will pick up the correct database to source from.