Search code examples
mysqlsqldbt

DBT join multiple tables


I am learning DBT, specifically dbt-mysql. I am having trouble combining several tables into one table. What I want to do: Group By several columns by the last_updated (timestamp) date of the table and then combine those columns into a single table by the split last_updated field. Here is how I want my data to end up:

enter image description here

Here is my staging model (which I think should be straight selects from the database):

staging/clients/stg_clients_fields.sql

SELECT id, created, last_updated, service, order_count, spent_count, deleted, country
FROM client_database.clients

Then I have intermediate models (which I think should reconstruct data for my needs): intermediate/clients/clients_last_updated_grouped.sql

SELECT YEAR(last_updated) as year_updated, MONTH(last_updated) as month_updated, COUNT(id) as client_count
FROM {{ ref('stg_clients_fields') }}
GROUP BY YEAR(last_updated), MONTH (last_updated)

intermediate/clients/clients_deleted_grouped.sql

SELECT YEAR(last_updated) as year_updated, MONTH(last_updated) as month_updated, COUNT(id) as deleted
FROM {{ ref('stg_clients_fields') }}
WHERE deleted = 1
GROUP BY YEAR(last_updated), MONTH (last_updated)

intermediate/clients/clients_service_grouped.sql

SELECT YEAR(last_updated) as year_updated, MONTH(last_updated) as month_updated, COUNT(id) as service
FROM {{ ref('stg_clients_fields') }}
WHERE service IS NOT NULL
GROUP BY YEAR(last_updated), MONTH (last_updated)

And other columns follow the same pattern based on their WHERE clauses.

Now I need to create a marts model that would use all previously created data and put it in one single table.

At this point, I end up with several tables that have the last_updated field separated and the specific column value next to the date.

How can I now combine all these tables that they would join on the last_updated split into to columns field?

Or perhaps there is a better solution to group data by year and month and get individual column values based on conditions?

I am new to DBT so all the help and all advice are welcome!


Solution

  • since clients_last_updated_grouped doesn't have a where condition, it's guaranteed to have all of the year/month combinations found in the other models. This makes it much easier. You can just select from that model and join the other models on year and month:

    with
        updated as (select * from {{ ref('clients_last_updated_grouped') }} ),
        deleted as (select * from  ),
        service as (select * from  ),
    
        joined as (
            select
                updated.year,
                updated.month,
                updated.client_count,
                coalesce(deleted.deleted, 0) as deleted_count,
                coalesce(service.service, 0) as service_count
    
            from
                updated
                left join deleted on updated.year = deleted.year and updated.month = deleted.month
                left join service on updated.year = service.year and updated.month = service.month
        )
    select *
    from joined
    

    If your database doesn't support CTEs (with ...), this becomes:

    select
        updated.year,
        updated.month,
        updated.client_count,
        coalesce(deleted.deleted, 0) as deleted_count,
        coalesce(service.service, 0) as service_count
    
    from
        {{ ref('clients_last_updated_grouped') }} as updated
        left join {{ ref('clients_deleted_grouped') }} as deleted on updated.year = deleted.year and updated.month = deleted.month
        left join {{ ref('clients_service_grouped') }} as service on updated.year = service.year and updated.month = service.month
    

    If it's not the case that clients_last_updated_grouped has every month/year combination of the other tables, you would need to first construct a "date spine", and then left join all 3 tables to that date spine.