I have user logins by date. My requirement is to track the number of users that have been logged in during the past 90 days window.
I am new to both SQL in general and Teradata specifically and I can't get the window functionality to work as I need.
I need the following result, where ACTIVE is a count of the unique USER_IDs that appear in the previous 90 day window the DATE.
DATES ACTIVE_IN_WINDOW
12/06/2018 20
13/06/2018 45
14/06/2018 65
15/06/2018 73
17/06/2018 24
18/06/2018 87
19/06/2018 34
20/06/2018 51
Currently my script is as follows.
It is this line here that I cant get right
COUNT ( USER_ID) OVER (PARTITION BY USER_ID ORDER BY EVT_DT ROWS BETWEEN 90 PRECEDING AND 0 FOLLOWING)
I suspect I need a different set of functions to make this work.
SELECT b.DATES , a.ACTIVE_IN_WINDOW
FROM
(
SELECT
CAST(CALENDAR_DATE AS DATE) AS DATES FROM SYS_CALENDAR.CALENDAR
WHERE DATES BETWEEN ADD_MONTHS(CURRENT_DATE, - 10) AND CURRENT_DATE
) b
LEFT JOIN
(
SELECT USER_ID , EVT_DT
, COUNT ( USER_ID) OVER (PARTITION BY USER_ID ORDER BY EVT_DT ROWS BETWEEN 90 PRECEDING AND 0 FOLLOWING) AS ACTIVE_IN_WINDOW
FROM ENV0.R_ONBOARDING
) a
ON a.EVT_DT = b.DATES
ORDER BY b.DATES
Thank you for any assistance.
The logic is similar to Gordon', but a non-equi-Join instead of a Correlated Scalar Subquery is usually more efficient on Teradata:
SELECT b.DATES , Count(DISTINCT USER_ID)
FROM
(
SELECT CALENDAR_DATE AS DATES
FROM SYS_CALENDAR.CALENDAR
WHERE DATES BETWEEN Add_Months(Current_Date, - 10) AND Current_Date
) b
LEFT JOIN
( -- apply DISTINCT before aggregation to reduce intermediate spool
SELECT DISTINCT USER_ID, EVT_DT
FROM ENV0.R_ONBOARDING
) AS a
ON a.EVT_DT BETWEEN Add_Months(b.DATES,-3) AND b.DATES
GROUP BY 1
ORDER BY 1
Of course this will require a large spool and much CPU.
Edit:
Switching to weeks reduces the overhead, I'm using dates instead of week numbers (it's easier to modify for other ranges):
SELECT b.Week , Count(DISTINCT USER_ID)
FROM
( -- Return only Mondays instead of DISTINCT over all days
SELECT calendar_date AS Week
FROM SYS_CALENDAR.CALENDAR
WHERE CALENDAR_DATE BETWEEN Add_Months(Current_Date, -9) AND Current_Date
AND day_of_week = 2 -- 2 = Monday
) b
LEFT JOIN
(
SELECT DISTINCT USER_ID,
-- td_monday returns the previous Monday, but we need the following monday
-- covers the previous Tuesday up to the current Monday
Td_Monday(EVT_DT+6) AS PERIOD_WEEK
FROM ENV0.R_ONBOARDING
-- You should add another condition to limit the actually covered date range, e.g.
-- where EVT_DT BETWEEN Add_Months(b.DATES,-13) AND b.DATES
) AS a
ON a.PERIOD_WEEK BETWEEN b.Week-(12*7) AND b.Week
GROUP BY 1
ORDER BY 1
Explain should duplicate the calendar as preparation for the product join, if not you might need to materialize the dates in a Volatile Table. Better don't use sys_calendar
, there are no statistics, e.g. optimizer doesn't know about how many days per week/month/year, etc. Check your system, there should be a calendar table designed for you company needs (with stats on all columns)