Search code examples
google-cloud-platformdbt

Can we have a dbt query that sources from two different databases (datasets) in the same project?


I have one dbt project that is deployed on one single GCP project. Within the GCP project, I have multiple datasets in BigQuery, which I want to run my query on.

So my compiled SQL code should look something like:

with 
  first_table as (
    select * from projectA.dataset_X.table_thishasinfo
  )
  second_table as (
    select * from projectA.dataset_Y.table_thishasmoreinfo
 select * 
from first_table ft
left join second_table st
on ft.id = st.id

Within dbt, these are of course references to the files in different models where they are created.

The query needs to write to a table proejctA.dataset_Z.table_thisistheoutput

I tried looking at target within dbt, or defining it in profiles, but I can't figure out how to have three different datasets involved in the project.

How could I write my dbt code to create a query that can run on multiple datasets?


Solution

  • It seems that the issue is resolved by editing the dbt_project.yml file.

    If you include the following:

    models: 
      <projectname>:
        <modelname>:
           database: projectA
           schema: dataset_X
        <modelname2>:
           database: projectA
           schema: dataset_Y
        <modelname3>:
           database: projectA
           schema: dataset_Z
    

    If you run all the models in the same production environment, let's say with a prefix dbt, then in your BQ, you will see the following three datasets: dbt_dataset_X, dbt_dataset_Y, `dbt_dataset_Z. And the queries will pick up the correct database to source from.