I have a dbt-core project that includes 2 tables (1 incremental, 1 that aggregates the incremental table).
Since dbt only logs the direct *.sql files but not the incremental macro magic with the MERGE statements the logged bytes_billed
in the /target/run_results.json
are missing the majority of the costs when you run your project.
To solve this issue I need a way to get all jobs that are created when I run my dbt project.
dbt_project.yml:
models:
dbt_tracking:
near_realtime:
+tags: near_realtime
+labels:
source: dbt_near_realtime
+materialized: table
config for the models/near_realtime/stg_near_realtime_events.sql
file:
{{
config(
labels={"task_id": "near_realtime"},
materialized="incremental",
incremental_strategy="merge",
on_schema_change="append_new_columns",
partition_by={
"field": "derived_tstamp",
"data_type": "timestamp",
"granularity": "hour",
},
partition_expiration_days=1,
cluster_by=["event_name"],
)
}}
config for the models/near_realtime/int_near_realtime_aggs.sql
file:
{{
config(
labels = {"task_id": "near_realtime"},
)
}}
execution command:
dbt run --select tag:near_realtime
but when I search for my label in the
`my_project_id.region-eu.INFORMATION_SCHEMA.JOBS_BY_USER`
the only label.key I get via:
select distinct
labels[SAFE_OFFSET(0)].key
from
`my_project_id.region-eu.INFORMATION_SCHEMA.JOBS_BY_USER`
where
timestamp_trunc(creation_time, day) = timestamp(current_date())
is this:
dbt_invocation_id
How can I query the jobs for my query so that I can see which total_bytes_billed my project run in dbt created for cost calculations?
Add this to your dbt_profile.yml:
query-comment:
job-label: True
comment: "{{ var('query_label', 'dbt_run') }}"
Then you can set a query label via:
dbt run --select tag:my_tag --vars '{"query_label": "here_is_my_label"}'
You can then analyse the total_bytes_billed per job query label in BQ via:
with
dbt_cloud_logs as (
select
datetime(creation_time, "Europe/Berlin") as creation_time
, (select unnested_labels.value from unnest(labels) as unnested_labels where unnested_labels.key = "dbt_invocation_id") as dbt_invocation_id
, (select unnested_labels.value from unnest(labels) as unnested_labels where unnested_labels.key = "query_comment") as dbt_query_label
, total_bytes_billed
from
`region-eu.INFORMATION_SCHEMA.JOBS`
where
1=1
and user_email = "your_service_account_or_email_adress"
and timestamp_trunc(creation_time, day, "Europe/Berlin") >= timestamp("when_did_you_implement_job_query_labels?", "Europe/Berlin")
)
select
*
from
dbt_cloud_logs
where
1=1
and dbt_query_label is not null
and dbt_invocation_id is not null
order by
dbt_query_label
, creation_time desc