I have a Python DBT project that defines the following data model (via YAML):
version: 2
models:
- name: company
description: ''
columns:
- name: ID
description: Unique identifier for the company entity. It is the company name
tests:
- unique
- not_null
- dbt_expectations.expect_column_values_to_be_of_type:
column_type: VARCHAR
- name: REMOTE_ID
description: Company name
tests:
- dbt_expectations.expect_column_values_to_be_of_type:
column_type: VARCHAR
- name: CREATED_AT
description: ''
tests:
- dbt_expectations.expect_column_values_to_be_of_type:
column_type: TIMESTAMP_TZ
- not_null
- name: UPDATED_AT
description: ''
tests:
- dbt_expectations.expect_column_values_to_be_of_type:
column_type: TIMESTAMP_TZ
When I run dbt compile
, then dbt run
then dbt test
I get a DBT test failure with the following console output:
17:59:30 Failure in test dbt_expectations_expect_column_values_to_be_of_type_company_CREATED_AT__TIMESTAMP_TZ (models/fizzbuzz/domain/company.yml)
17:59:30 Got 1 result, configured to fail if != 0
17:59:30
17:59:30 compiled Code at target/compiled/myapp/models/fizzbuzz/domain/company.yml/dbt_expectations_expect_column_837854ce21e79ee1abe42f69891c68e9.sql
When I go to that compiled SQL at dbt_expectations_expect_column_837854ce21e79ee1abe42f69891c68e9.sql
this is what I see:
WITH relation_columns
AS (SELECT Cast('ID' AS VARCHAR) AS relation_column,
Cast('VARCHAR' AS VARCHAR) AS relation_column_type
UNION ALL
SELECT Cast('REMOTE_ID' AS VARCHAR) AS relation_column,
Cast('VARCHAR' AS VARCHAR) AS relation_column_type
UNION ALL
SELECT Cast('CREATED_AT' AS VARCHAR) AS relation_column,
Cast('TIMESTAMP_NTZ' AS VARCHAR) AS relation_column_type
UNION ALL
SELECT Cast('UPDATED_AT' AS VARCHAR) AS relation_column,
Cast('TIMESTAMP_NTZ' AS VARCHAR) AS relation_column_type),
test_data
AS (SELECT *
FROM relation_columns
WHERE relation_column = 'CREATED_AT'
AND relation_column_type NOT IN ( 'TIMESTAMP_TZ' ))
SELECT *
FROM test_data
So my understanding is that when I run dbt compile
it generates this test above from the compiled YAML file definition. And that when I subsequently run dbt test
, it runs this SQL to determine whether the test defined in that YAML passes or fails.
If that's correct, don't we have a bug in DBT here? Is it me or isn't this a false negative test that is hard-coded to always fail?!
It's creating a temp table/CTE called test_data
that is in no way, shape or form connecting to my company
database table, but that has 2 columns (relation_column
and relation_column_type
). Then its giving this table a row where the relation_column
value is "CREATED_AT" and its relation_column_type
value is "TIMESTAMP_NTZ". And finally its returning any records whose relation_column
value is "CREATED_AT" but whose relation_column_type
value is not "TIMESTAMP_NTZ"...so wouldn't this always fail?!
Or am I missing something big about how DBT and its tests work? Thanks for any clarifications here.
Looking into it the DBT community no longer supports the dbt test
command nor the DBT expectations offshoot project, likely for reasons just like this.