Search code examples
sqlpostgresqldatetimesubquerywhere-clause

trouble joining two date tables with consecutive dates starting at customer create date and ending at current date?


I am creating a customer activity by day table, which requires 9 CTEs. The first table I want to cross join all customer unique IDs with the dates of a calendar table. So there will be multiple rows with the same unique ID for each day. The problem is making sure the days are consecutive, regardless of the dates in the following CTEs. This is a shortened example of what it would look like this:

GUID      DATE       CONDITIONS
1         3/13/2015    [NULL]
1         3/14/2015     Y
1         3/15/2015     [NULL]
....
1         9/2/2020      Y
2         4/15/2015     Y
2         4/16/2015     [NULL]
2         4/17.2015     [NULL]
2         4/18/2015     Y
...
2         9/2/2020      [NULL]

And so on - so that each customers has consecutive dates with their GUID, beginning with the creation date of their account (i.e. 3/13/2015) and ending on the current date.

the create date is on Table 1 with the unique ID, and I'm joining it with a date table. My problem is that I can't get the query to run with a minimum create date per unique ID. Because if I don't create a minimum start date, the query runs forever (it's trying to create every unique ID for every consecutive date, even before the customer account was created.)

This is the code I have now. Can anyone tell me if I have made the min. create date right? It's still just timing out when I run the query.

with 
cte_carrier_guid (carrier_guid, email, date, carrier_id) as 
    (
    SELECT
        guid as carrier_guid
        ,mc.email 
        ,dt2.date as date
        ,mc.id as carrier_id
    FROM ctms_db_public.msd_carrier mc
    CROSS JOIN public.dim_calendar dt2
    WHERE dt2.date <= CURRENT_DATE 
    AND mc.created_at >= dt2.date
    GROUP BY guid, mc.id, dt2."date", mc.email
    ORDER BY guid, dt2.date asc
    )
    Select top 10 * from cte_carrier_guid 

Solution

  • Here:

    dt2.date <= CURRENT_DATE AND mc.created_at >= dt2.date
    

    Since you want dates between the creation date of the user and today, you probably want the inequality condition on the creation date the other way around. I find it easier to follow when we put the lower bound first:

    dt2.date >= mc.created_at AND dt2.date <= CURRENT_DATE 
    

    Other things about the query:

    • You want an INNER JOIN in essence, so use that instead of CROSS JOIN ... WHERE; it is clearer

    • ORDER BY in a cte makes no sense to me

    • Do you really need GROUP BY? The columns in the SELECT clause are the same as in the GROUP BY, so all this does is remove potential duplicates (but why would there be duplicates?)

    You could probably phrase the cte as:

    SELECT ...
    FROM ctms_db_public.msd_carrier mc
    INNER JOIN public.dim_calendar dt2 ON dt2.date >= mc.created_at
    WHERE dt2.date <= CURRENT_DATE