Below is an image of the table:
https://i.sstatic.net/mPUGV.png
I have a table that tracks the users' visit of a mobile app. Each row represents the datetime when a user enters a page in the app. Min_btw_page shows the number of minute between each page visit. When Min_btw_page is >= 30 mins, the session is considered to be complete and the next page visit will be counted as a new session. What I am trying to find are:
I have used the lag() function to create "Min_btw_next_page". I have also created the column "row_no", attempting to give a sequence no. to each row by HashID by session, but I failed. The result should be like the column "Expected_row_no". However, even if I am able to achieve the right row number, I still do not know how to aggregate the rows by session as I would not be able to partition row number.
My understanding of your question is that you want to distinguish 'sessions' for users. You define a new 'session' as one where the user hadn't done anything for more than 30 minutes. As such, if someone does many actions, with 20 minutes or so between each action, it still counts as a single 'session'.
One approach (definitely not the only approach) would start with a minor change to what you have. Note also that here is only a partial answer - it's preparation for later analyses.
Note also that
SUM(column) OVER (ORDER BY ...)
to get a running total.Here, what I'm doing is to create a column where all values in a 'session' get the same value e.g., the first six rows in your table get value 1, the next two rows get value 2, the next eight rows get value 3. From there, you can group by to find averages etc, and also doing other things like numbering become trivial.
Process involves
Data setup
CREATE TABLE #DeviceLoads (LogID int IDENTITY(1,1), HashID nvarchar(10), DeviceDatetime datetime);
INSERT INTO #DeviceLoads (HashID, DeviceDatetime) VALUES
('ID1', '20201013 15:26'),
('ID1', '20201013 15:26'),
('ID1', '20201013 15:28'),
('ID1', '20201013 15:28'),
('ID1', '20201013 15:28'),
('ID1', '20201014 14:59'),
('ID1', '20201014 14:59'),
('ID1', '20201014 16:17'),
('ID1', '20201014 16:46'),
('ID1', '20201014 17:15'),
('ID1', '20201014 17:46');
Here is the one command (feel free to split it up though).
DL_Source
uses the LAG function (similar to what you have, I believe, to create your original table) to determine the last activity timeDL_Session_Source
gets the data from above, and flags the new sessions with value 1DL_Session_Source
WITH DL_source AS -- This is probably similar to what you have already
(SELECT LogID, HashID, DeviceDatetime, LAG(DeviceDatetime, 1) OVER (PARTITION BY HashId ORDER BY DeviceDatetime, LogID) AS Last_DeviceDateTime
FROM #DeviceLoads),
DL_Session_Source AS
(SELECT LogID, HashID, DeviceDatetime, Last_DeviceDateTime, CASE WHEN DATEDIFF(minute, Last_DeviceDateTime, DeviceDatetime) <= 30 THEN 0 ELSE 1 END AS New_Session_flag
FROM DL_source)
SELECT *, SUM(New_Session_flag) OVER (ORDER BY HashID, DeviceDatetime, LogID) AS Session_Num
FROM DL_Session_Source;
And here are the results (with seconds truncated for brevity). Note the column on the end (Session_Num) which indicates which rows are in which session.
LogID HashID DeviceDatetime Last_DeviceDateTime New_Session_flag Session_Num
1 ID1 2020-10-13 15:26 NULL 1 1
2 ID1 2020-10-13 15:26 2020-10-13 15:26 0 1
3 ID1 2020-10-13 15:28 2020-10-13 15:26 0 1
4 ID1 2020-10-13 15:28 2020-10-13 15:28 0 1
5 ID1 2020-10-13 15:28 2020-10-13 15:28 0 1
6 ID1 2020-10-14 14:59 2020-10-13 15:28 1 2
7 ID1 2020-10-14 14:59 2020-10-14 14:59 0 2
8 ID1 2020-10-14 16:17 2020-10-14 14:59 1 3
9 ID1 2020-10-14 16:46 2020-10-14 16:17 0 3
10 ID1 2020-10-14 17:15 2020-10-14 16:46 0 3
11 ID1 2020-10-14 17:46 2020-10-14 17:15 1 4
From here, feel free to save to a temporary table or so for further processing e.g.,
SELECT Session_Num,
HashID,
COUNT(*) AS Num_Actions,
MIN(DeviceDateTime) AS First_Action,
MAX(DeviceDateTime) AS Last_Action
FROM #YourTempTable
GROUP BY Session_Num, HashID;
Here's a db<>fiddle with some added 'interweaved' data (e.g., out of order and overlapping for HashID ID2) to help ensure it works as required.