Just a general question. Can you sessionize logs in an RDBMS?
For example, say you just have three columns 1) timestamp 2) URL 3) UserID is it possible to sessionize the logs based of X minutes of activity in a traditional RDBMS. The output might look like four columns 1) timestamp 2) URL 3) UserID 4)SessionID.
I'd assume it's not, but want to hear others opinion.
Thanks
It's a bit tricky, but can be done using nested Windowed Aggregate Functions like
SELECT timestamp, UserID, URL,
SUM(newSession) -- cumulative sum over 0/1
OVER (PARTITION BY UserId
ORDER BY timestamp
ROWS UNBOUNDED PRECEDING) AS SessionID
FROM
(
SELECT
ts_col, UserID, URL,
-- calculate the timestamp difference between current and previous row
CASE WHEN timestamp - LAG(timestamp)
OVER (PARTITION BY UserId
ORDER BY timestamp) > INTERVAL 'X minutes'
THEN 1 -- new session starts
ELSE 0 -- part of the old session
END AS newSession
) AS dt
Some DBMSes (e.g. Vertica & Aster) support sessionization using builtin functions, in others you might implement a User Defined Function.