Search code examples
sql-servert-sqlsql-server-2016

Reduce table records based on minimum time difference


I have a log table (MS SQL SERVER) with event entries (events are user actions like "user logged in", "user viewed entity A" etc).

Some events like "user viewed entity A" may occur multiple times within a short time frame. For instance if a user goes back and forward in his browser he may enter entity A's page multiple times within a minute, and multiple "user view" events will be logged.

For my analytics dashboard I would like to count how many times a user viewed entity A, but I would like to "debounce" the result. I want to consider multiple "user view" events close to one another as one "user view" event. Specifically, I want to consider a new "user view" event only if it is more than 30 minutes from the last one.

So having a table like this (last column is my comments for clarity):

timestamp evt_type user_id entity_id *time diff from previous event
15:30 ENTITY_VIEW U1 E1 NULL (first view)
15:38 ENTITY_VIEW U1 E1 8mins
16:05 ENTITY_VIEW U1 E1 28mins
16:50 ENTITY_VIEW U1 E1 45mins (this counts as new view)
17:15 ENTITY_VIEW U1 E1 25mins
17:44 ENTITY_VIEW U1 E1 29mins
18:30 ENTITY_VIEW U1 E1 46mins (this counts as another view)

I would like to determine that the user "viewed" the entity 3 times.

What would be a query to determine this? I tried LEAD, LAG, PARTITION BY and other comnbinations but I don't seem to find the correct way as I am not an SQL expert.


Solution

  • Should be a simple LAG() to grab the previous timestamp and check the diff. Will say your column [timestamp] is an odd data type, what about different days? Is there a separate column for date?

    Return Records >30 Minutes from Previous Record

    WITH cte_DeltaSinceLastView AS (
        SELECT *
            /*Grab previous record for each user_id/entity_id combo*/
            ,PrevTimestamp = LAG([timestamp]) OVER (PARTITION BY [user_id],[entity_id] ORDER BY [timestamp])
        FROM YourTable
        ) AS A(ID,[user_id],[entity_id],[timestamp])
    )
    SELECT *,MinutesSinceLastView = DATEDIFF(minute,PrevTimestamp,[Timestamp])
    FROM cte_DeltaSinceLastView
    WHERE DATEDIFF(minute,PrevTimestamp,[timestamp]) > 30 /*Over 30 minutes between last view*/
    OR PrevTimestamp IS NULL /*First view will not have previous timestamp to compare against*/