Search code examples
dbt

Automatically-generated Python dbt test seems to be hardcoded to fail


I have a Python DBT project that defines the following data model (via YAML):

version: 2
models:
- name: company
  description: ''
  columns:
  - name: ID
    description: Unique identifier for the company entity. It is the company name
    tests:
      - unique
      - not_null
      - dbt_expectations.expect_column_values_to_be_of_type:
          column_type: VARCHAR
  - name: REMOTE_ID
    description: Company name
    tests:
      - dbt_expectations.expect_column_values_to_be_of_type:
          column_type: VARCHAR
  - name: CREATED_AT
    description: ''
    tests:
      - dbt_expectations.expect_column_values_to_be_of_type:
          column_type: TIMESTAMP_TZ
      - not_null
  - name: UPDATED_AT
    description: ''
    tests:
      - dbt_expectations.expect_column_values_to_be_of_type:
          column_type: TIMESTAMP_TZ

When I run dbt compile, then dbt run then dbt test I get a DBT test failure with the following console output:

17:59:30  Failure in test dbt_expectations_expect_column_values_to_be_of_type_company_CREATED_AT__TIMESTAMP_TZ (models/fizzbuzz/domain/company.yml)
17:59:30    Got 1 result, configured to fail if != 0
17:59:30  
17:59:30    compiled Code at target/compiled/myapp/models/fizzbuzz/domain/company.yml/dbt_expectations_expect_column_837854ce21e79ee1abe42f69891c68e9.sql

When I go to that compiled SQL at dbt_expectations_expect_column_837854ce21e79ee1abe42f69891c68e9.sql this is what I see:

WITH relation_columns
     AS (SELECT Cast('ID' AS VARCHAR)      AS relation_column,
                Cast('VARCHAR' AS VARCHAR) AS relation_column_type
         UNION ALL
         SELECT Cast('REMOTE_ID' AS VARCHAR) AS relation_column,
                Cast('VARCHAR' AS VARCHAR)   AS relation_column_type
         UNION ALL
         SELECT Cast('CREATED_AT' AS VARCHAR)    AS relation_column,
                Cast('TIMESTAMP_NTZ' AS VARCHAR) AS relation_column_type
         UNION ALL
         SELECT Cast('UPDATED_AT' AS VARCHAR)    AS relation_column,
                Cast('TIMESTAMP_NTZ' AS VARCHAR) AS relation_column_type),
     test_data
     AS (SELECT *
         FROM   relation_columns
         WHERE  relation_column = 'CREATED_AT'
                AND relation_column_type NOT IN ( 'TIMESTAMP_TZ' ))
SELECT *
FROM   test_data 

So my understanding is that when I run dbt compile it generates this test above from the compiled YAML file definition. And that when I subsequently run dbt test, it runs this SQL to determine whether the test defined in that YAML passes or fails.

If that's correct, don't we have a bug in DBT here? Is it me or isn't this a false negative test that is hard-coded to always fail?!

It's creating a temp table/CTE called test_data that is in no way, shape or form connecting to my company database table, but that has 2 columns (relation_column and relation_column_type). Then its giving this table a row where the relation_column value is "CREATED_AT" and its relation_column_type value is "TIMESTAMP_NTZ". And finally its returning any records whose relation_column value is "CREATED_AT" but whose relation_column_type value is not "TIMESTAMP_NTZ"...so wouldn't this always fail?!

Or am I missing something big about how DBT and its tests work? Thanks for any clarifications here.


Solution

  • Looking into it the DBT community no longer supports the dbt test command nor the DBT expectations offshoot project, likely for reasons just like this.