Search code examples
sqldatabasepostgresqlhadooprdbms

Can you Sessionize Weblogs in an RDBMS


Just a general question. Can you sessionize logs in an RDBMS?

For example, say you just have three columns 1) timestamp 2) URL 3) UserID is it possible to sessionize the logs based of X minutes of activity in a traditional RDBMS. The output might look like four columns 1) timestamp 2) URL 3) UserID 4)SessionID.

I'd assume it's not, but want to hear others opinion.

Thanks


Solution

  • It's a bit tricky, but can be done using nested Windowed Aggregate Functions like

    SELECT timestamp, UserID, URL,
       SUM(newSession) -- cumulative sum over 0/1
       OVER (PARTITION BY UserId
             ORDER BY timestamp
             ROWS UNBOUNDED PRECEDING) AS SessionID
    FROM
     (
       SELECT 
          ts_col, UserID, URL,
          -- calculate the timestamp difference between current and previous row
          CASE WHEN timestamp - LAG(timestamp) 
                                OVER (PARTITION BY UserId 
                                      ORDER BY timestamp) > INTERVAL 'X minutes' 
               THEN 1  -- new session starts
               ELSE 0  -- part of the old session
          END AS newSession
     ) AS dt
    

    Some DBMSes (e.g. Vertica & Aster) support sessionization using builtin functions, in others you might implement a User Defined Function.