I'm using BigQuery and DataStudio to show retention plotted throughout the weeks and i'm having trouble with that when a user is new and then uses the app again the same week he's both new and retained in my calculations and I would like it to be so that he's only new the first week of using the app and then if he uses the app again within 2 weeks, he's "Retained".
This is my query:
SELECT
UserID,
DATE,
DATE_DIFF(DATE,PreviousSessionDATE, DAY) as DaysBetweenSessions,
(SELECT
CASE
WHEN DaysBetweenSessions <= 14 THEN 'Retained'
WHEN DaysBetweenSessions >14 THEN 'Returned'
WHEN DaysBetweenSessions IS NULL AND FirstSessionDATE = DATE THEN 'New'
WHEN DaysBetweenSessions IS NULL THEN 'User has an old version without Retention Parameters'
END) as User_Type
FROM
app_project.analytics_*********.events_*
GROUP BY
1,2,3,4
ORDER BY
DATE DESC,
DaysBetweenSessions DESC,
1,2,3,4
Then the result will generally be good except for when a user uses the app several times and get a between 1-14 as DaysBetweenSessions and then be counted as both new and retained within the same week.
Then in DataStudio I will do YEARWEEK(DATE) to vizualise per week and Count_Distinct(UserID) as my metrics.
Any ideas on how i can sort out so that the new user is only counted as new in the first week even if the user trains within that week?
Current output in BQ:
UserID DATE DaysBetweenSessions User_Type
123 20180801 NULL "New"
123 20180801 0 "Retained"
And desired output
UserID DATE DaysBetweenSessions User_Type
123 20180801 NULL "New"
There might be a neater way to do this but...
WITH CTE AS
(SELECT
UserID,
DATE,
DATE_DIFF(DATE,PreviousSessionDATE, DAY) as DaysBetweenSessions,
(SELECT
CASE
WHEN DaysBetweenSessions <= 14 THEN 'Retained'
WHEN DaysBetweenSessions >14 THEN 'Returned'
WHEN DaysBetweenSessions IS NULL AND FirstSessionDATE = DATE THEN 'New'
WHEN DaysBetweenSessions IS NULL THEN 'User has an old version without Retention
Parameters'
END) as User_Type,
(SELECT
CASE
WHEN DaysBetweenSessions <= 7 THEN 0
WHEN DaysBetweenSessions >7 THEN 1
WHEN DaysBetweenSessions IS NULL AND FirstSessionDATE = DATE THEN 0
WHEN DaysBetweenSessions IS NULL THEN 2
END) as DaysBetween
FROM
app_project.analytics_*********.events_*
GROUP BY
1,2,3,4
ORDER BY
DATE DESC,
DaysBetweenSessions DESC,
1,2,3,4),
Result as
(SELECT *, min(User_Type) OVER (PARTITION BY UserID, DaysBetween) minUser_Type
FROM CTE)
SELECT UserID,
DATE,
DaysBetweenSessions,
User_type
FROM Result
WHERE NOT (User_Type <> 'New' AND minUser_Type = 'New')
The second part should add a dimension which is the alphabetically lowest User_Type in that week (so if you renamed anything to alphabetically less than 'New' it wouldn't work, better using numbers ideally).
The last part should get rid of the ones where there was a 'New' in that week but the row User_Type isn't 'New'.