Search code examples
google-cloud-platformdataform

Multiple data sets in dataform per GCP project


I have just spun up a dataform project that is connected to a google cloud project. My question is, how do i go about formatting dataform to allow for multiple datasets which each contsain their own bespoke sql code?

example structure:

dataset 1
 sql_1
 sql_2
 sql_3

dataset 2
 sql_1
 sql_2
 sql_3

from what i have read so far, can i only interact with one dataset at a time?

The dataform.json file allows me to set the following:

{
  "defaultSchema": "test_datasets",
  "assertionSchema": "dataform_assertions",
  "warehouse": "bigquery",
  "defaultDatabase": "project_name",
  "defaultLocation": "EU"
}

However if we have more than one dataset in a project, do i need to alter the json file to set another data set? Or is there a better way to deal with gcp projects with multple datasets?


Solution

  • To do this, specify a schema in the config block by adding the kv pair: schema: "your-schema-name". Refer here for more: https://docs.dataform.co/guides/datasets/publish#overriding-a-datasets-schema-or-name .