I have a log table (MS SQL SERVER) with event entries (events are user actions like "user logged in", "user viewed entity A" etc).
Some events like "user viewed entity A" may occur multiple times within a short time frame. For instance if a user goes back and forward in his browser he may enter entity A's page multiple times within a minute, and multiple "user view" events will be logged.
For my analytics dashboard I would like to count how many times a user viewed entity A, but I would like to "debounce" the result. I want to consider multiple "user view" events close to one another as one "user view" event. Specifically, I want to consider a new "user view" event only if it is more than 30 minutes from the last one.
So having a table like this (last column is my comments for clarity):
timestamp | evt_type | user_id | entity_id | *time diff from previous event |
---|---|---|---|---|
15:30 | ENTITY_VIEW | U1 | E1 | NULL (first view) |
15:38 | ENTITY_VIEW | U1 | E1 | 8mins |
16:05 | ENTITY_VIEW | U1 | E1 | 28mins |
16:50 | ENTITY_VIEW | U1 | E1 | 45mins (this counts as new view) |
17:15 | ENTITY_VIEW | U1 | E1 | 25mins |
17:44 | ENTITY_VIEW | U1 | E1 | 29mins |
18:30 | ENTITY_VIEW | U1 | E1 | 46mins (this counts as another view) |
I would like to determine that the user "viewed" the entity 3 times.
What would be a query to determine this? I tried LEAD, LAG, PARTITION BY and other comnbinations but I don't seem to find the correct way as I am not an SQL expert.
Should be a simple LAG()
to grab the previous timestamp and check the diff. Will say your column [timestamp] is an odd data type, what about different days? Is there a separate column for date?
WITH cte_DeltaSinceLastView AS (
SELECT *
/*Grab previous record for each user_id/entity_id combo*/
,PrevTimestamp = LAG([timestamp]) OVER (PARTITION BY [user_id],[entity_id] ORDER BY [timestamp])
FROM YourTable
) AS A(ID,[user_id],[entity_id],[timestamp])
)
SELECT *,MinutesSinceLastView = DATEDIFF(minute,PrevTimestamp,[Timestamp])
FROM cte_DeltaSinceLastView
WHERE DATEDIFF(minute,PrevTimestamp,[timestamp]) > 30 /*Over 30 minutes between last view*/
OR PrevTimestamp IS NULL /*First view will not have previous timestamp to compare against*/