Search code examples
sqlgroupingteradatarow-number

classifying the records into different groups based on dates using Teradata


I have the following data with their MIN, MAX and CLOSE dates.

All I am trying to classify these accounts into three groups based on these dates and show the count for each month they are active

Have

  custID         MINDate     MAXDATE         CLOSEDATE
  10001          1/1/2022   12/31/2022
  20001          7/6/2022   12/31/2022
  30001          4/5/20022  6/10/2022         6/10/2022
  40001          1/1/2022   12/31/2022 

Grp1 - If the `custID is available entire 2022 then it would be shown under all the 12 months

Grp2 - If the 'CustID is available only for a few days then it would be in 'Grp2 only for that particular month and for the remaining months it would be in 'Grp1

Grp3 - If 'CustId is populated with 'CLOSEDATE then it would be `Grp3 for that closed month

Want

           jan22 Feb22 Mar22 Apr22 May22 Jun22 Jul22 Aug22 Sep22 Oct22 Nov22 Dec22       
Grp1         2    2     2      2     3      2    2      3    3     3    3     3   
Grp2                           1                 1    
Grp3                                        1

10001 - Y across all the months Grp1

20001 - Y in July Grp2, Y from August - December

30001 - Y in April Grp2, Y in May Grp1 and Y in Jun Grp3

40001 - Y across all the months Grp1

I tried to build the following logic but lost in the middle

    WITH cte1 AS (
      SELECT custID,
        MINDate,
        MAXDATE,
        ADD_MONTHS(TRUNC(MINDate, 'MM'), ROW_NUMBER() OVER (PARTITION BY custID ORDER BY MINDate) - 1) AS Month
      FROM Have
      QUALIFY EXTRACT(MONTH FROM MINDate) = EXTRACT(MONTH FROM ADD_MONTHS(TRUNC(MINDate, 'MM'), ROW_NUMBER() OVER (PARTITION BY custID ORDER BY MINDate) - 1))
    ), cte2 AS (
      SELECT custID,
        Month,
        CASE WHEN Month = ADD_MONTHS(TRUNC(MAXDATE, 'MM'), 1) THEN LAST_DAY(MAXDATE) - MAXDATE + 1 ELSE 1 END AS Days
      FROM (
        SELECT custID,
          MIN(Month) AS Month,
          MAX(MAXDATE) AS MAXDATE
        FROM cte1
        GROUP BY custID
      ) t1
      INNER JOIN cte1 t2 ON t1.custID = t2.custID AND t1.Month <= t2.Month AND t2.Month < ADD_MONTHS(TRUNC(MAXDATE, 'MM'), 1)
    ), 
      SELECT Month, SUM(Days) AS Days
      FROM cte2
      GROUP BY Month

Solution

  • Here's how I would solve this:

    1. Join to sys_calendar.calendar to get all dates between the MINDATE and MAXDATE. Filter for just the dates that occur on the 1st of the month for each sys_calendar.calendar.calendar_date. This way we have a record for every custid/month_start combination.

      (You could also use Teradata's EXPAND ON clause so you don't have to go to sys_calendar table and deal with that join, but EXPAND ON is not compatible with PIVOT in the same sql submission)

    2. Using a CASE expression determine, for each of those records, in which group they belong using the logic you outlined.

    3. Pivot.

    This will look like:

    Updated to incorporate dnoeth's improvements

    WITH expanded_months AS 
    (
        SELECT Have.*, calendar_date as month_start
        FROM Have 
            INNER JOIN sys_calendar.calendar cal 
                ON PERIOD(TRUNC(MINDATE, 'MM'), NEXT(MAXDATE)) CONTAINS cal.calendar_date
        WHERE cal.day_of_month = 1
    )
    , month_groups AS 
    (
    SELECT em.*, COUNT(month_start) OVER (PARTITION BY custID) as month_count,
        CASE 
             WHEN month_count < 12 AND month_start = TRUNC(CLOSEDATE, 'MM') THEN 'Grp3'
             WHEN month_count < 12 AND LAG(month_start) OVER (PARTITION BY custid ORDER BY month_start) IS NULL THEN 'Grp2'
             ELSE 'Grp1'
             END as Grp
    FROM expanded_months em
    )
    SELECT pvt.*
    FROM (SELECT Grp, month_start FROM month_groups) mg
    PIVOT (
          count(*) as val
          FOR month_start 
          IN (
                DATE '2022-01-01' AS jan22,
                DATE '2022-02-01' AS feb22,
                DATE '2022-03-01' AS mar22,
                DATE '2022-04-01' AS apr22,
                DATE '2022-05-01' AS may22,
                DATE '2022-06-01' AS jun22,
                DATE '2022-07-01' AS jul22,
                DATE '2022-08-01' AS aug22,
                DATE '2022-09-01' AS sep22,
                DATE '2022-10-01' AS oct22,
                DATE '2022-11-01' AS nov22,
                DATE '2022-12-01' AS dec22
            )     
          )pvt
    ORDER BY Grp
    

    Grp jan22_val feb22_val mar22_val apr22_val may22_val jun22_val jul22_val aug22_val sep22_val oct22_val nov22_val dec22_val
    Grp1 2 2 2 2 3 2 2 3 3 3 3 3
    Grp2 0 0 0 1 0 0 1 0 0 0 0 0
    Grp3 0 0 0 0 0 1 0 0 0 0 0 0