I have the following data with their MIN
, MAX
and CLOSE
dates.
All I am trying to classify these accounts into three groups based on these dates and show the count for each month they are active
Have
custID MINDate MAXDATE CLOSEDATE
10001 1/1/2022 12/31/2022
20001 7/6/2022 12/31/2022
30001 4/5/20022 6/10/2022 6/10/2022
40001 1/1/2022 12/31/2022
Grp1 - If the `custID is available entire 2022 then it would be shown under all the 12 months
Grp2 - If the 'CustID is available only for a few days then it would be in 'Grp2 only for that particular month and for the remaining months it would be in 'Grp1
Grp3 - If 'CustId is populated with 'CLOSEDATE then it would be `Grp3 for that closed month
Want
jan22 Feb22 Mar22 Apr22 May22 Jun22 Jul22 Aug22 Sep22 Oct22 Nov22 Dec22
Grp1 2 2 2 2 3 2 2 3 3 3 3 3
Grp2 1 1
Grp3 1
10001 - Y across all the months Grp1
20001 - Y in July Grp2, Y from August - December
30001 - Y in April Grp2, Y in May Grp1 and Y in Jun Grp3
40001 - Y across all the months Grp1
I tried to build the following logic but lost in the middle
WITH cte1 AS (
SELECT custID,
MINDate,
MAXDATE,
ADD_MONTHS(TRUNC(MINDate, 'MM'), ROW_NUMBER() OVER (PARTITION BY custID ORDER BY MINDate) - 1) AS Month
FROM Have
QUALIFY EXTRACT(MONTH FROM MINDate) = EXTRACT(MONTH FROM ADD_MONTHS(TRUNC(MINDate, 'MM'), ROW_NUMBER() OVER (PARTITION BY custID ORDER BY MINDate) - 1))
), cte2 AS (
SELECT custID,
Month,
CASE WHEN Month = ADD_MONTHS(TRUNC(MAXDATE, 'MM'), 1) THEN LAST_DAY(MAXDATE) - MAXDATE + 1 ELSE 1 END AS Days
FROM (
SELECT custID,
MIN(Month) AS Month,
MAX(MAXDATE) AS MAXDATE
FROM cte1
GROUP BY custID
) t1
INNER JOIN cte1 t2 ON t1.custID = t2.custID AND t1.Month <= t2.Month AND t2.Month < ADD_MONTHS(TRUNC(MAXDATE, 'MM'), 1)
),
SELECT Month, SUM(Days) AS Days
FROM cte2
GROUP BY Month
Here's how I would solve this:
Join to sys_calendar.calendar
to get all dates between the MINDATE
and MAXDATE
. Filter for just the dates that occur on the 1st of the month for each sys_calendar.calendar.calendar_date
. This way we have a record for every custid
/month_start
combination.
(You could also use Teradata's EXPAND ON
clause so you don't have to go to sys_calendar
table and deal with that join, but EXPAND ON
is not compatible with PIVOT
in the same sql submission)
Using a CASE expression determine, for each of those records, in which group they belong using the logic you outlined.
Pivot.
This will look like:
Updated to incorporate dnoeth's improvements
WITH expanded_months AS
(
SELECT Have.*, calendar_date as month_start
FROM Have
INNER JOIN sys_calendar.calendar cal
ON PERIOD(TRUNC(MINDATE, 'MM'), NEXT(MAXDATE)) CONTAINS cal.calendar_date
WHERE cal.day_of_month = 1
)
, month_groups AS
(
SELECT em.*, COUNT(month_start) OVER (PARTITION BY custID) as month_count,
CASE
WHEN month_count < 12 AND month_start = TRUNC(CLOSEDATE, 'MM') THEN 'Grp3'
WHEN month_count < 12 AND LAG(month_start) OVER (PARTITION BY custid ORDER BY month_start) IS NULL THEN 'Grp2'
ELSE 'Grp1'
END as Grp
FROM expanded_months em
)
SELECT pvt.*
FROM (SELECT Grp, month_start FROM month_groups) mg
PIVOT (
count(*) as val
FOR month_start
IN (
DATE '2022-01-01' AS jan22,
DATE '2022-02-01' AS feb22,
DATE '2022-03-01' AS mar22,
DATE '2022-04-01' AS apr22,
DATE '2022-05-01' AS may22,
DATE '2022-06-01' AS jun22,
DATE '2022-07-01' AS jul22,
DATE '2022-08-01' AS aug22,
DATE '2022-09-01' AS sep22,
DATE '2022-10-01' AS oct22,
DATE '2022-11-01' AS nov22,
DATE '2022-12-01' AS dec22
)
)pvt
ORDER BY Grp
Grp | jan22_val | feb22_val | mar22_val | apr22_val | may22_val | jun22_val | jul22_val | aug22_val | sep22_val | oct22_val | nov22_val | dec22_val |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Grp1 | 2 | 2 | 2 | 2 | 3 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
Grp2 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Grp3 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |