Search code examples
sqlgoogle-bigquerydata-analysis

Is it possible to get active sessions per hour in SQL?


start_time end_time HostID gameID
6/14/2021 20:13 6/14/2021 22:22 1 AB1
6/14/2021 20:20 6/14/2021 21:47 2 AB2
6/14/2021 20:22 6/14/2021 22:07 3 AB3
6/14/2021 20:59 6/14/2021 21:15 4 AB4
6/15/2021 21:24 6/15/2021 22:09 1 AB5
6/15/2021 21:24 6/15/2021 21:59 2 AB6
6/15/2021 23:11 6/16/2021 01:22 4 AB7
6/16/2021 20:13 6/16/2021 21:23 3 AB8

I have a table that has a start time and end time. I want to count the active game rooms per hour. I know I should have at least try solving this but I really don't know where to start or if is this even possible with SQL.

What I first did is to COUNT how many gameID there are in an hour using the start_time. But I'm sure I did not answer the 'active' per session question. What I did was to just COUNT how many has started a game per hour.

The expected result is something like this

DayHour

Time Active
6/14/2021 2000 4
6/15/2021 2100 4
6/16/2021 2200 2
6/15/2021 2100 2
6/16/2021 2200 1
6/17/2021 2300 1
6/16/2021 0 1
6/17/2021 1 1
6/18/2021 2000 1
6/19/2021 2100 1

Or count of active sessions per hour without grouping by day.

Hr

Time Active
2000 5
2100 7
2200 3
2300 1
0000 1
0001 1

Solution

  • I suggest below solution

    select timestamp_trunc(minute, hour) hour, 
      count(distinct hostid) hosts,
      count(distinct gameid) games
    from `project.dataset.table`, 
    unnest(generate_timestamp_array(
      parse_timestamp('%m/%d/%Y %H:%M', start_time), 
      parse_timestamp('%m/%d/%Y %H:%M', end_time), 
      interval 1 minute)) minute
    group by hour
    # order by hour          
    

    if applied to sample data in your question (with fix for end_time in last row - should be 6/16/2021 21:23 - not 6/6/2021 21:23) - output is

    enter image description here

    Brief explanation

    1. Expand/split each original row into one row for each minute between start_time and end_time
    2. And then, simply aggregate by hour applying count(distinct ...)

    Or count of active sessions per hour without grouping by day.

    You can apply exactly same approach

    select extract(hour from minute) hour, 
      count(distinct hostid) hosts,
      count(distinct gameid) games
    from `project.dataset.table`, 
    unnest(generate_timestamp_array(
      parse_timestamp('%m/%d/%Y %H:%M', start_time), 
      parse_timestamp('%m/%d/%Y %H:%M', end_time), 
      interval 1 minute)) minute
    group by hour
    order by hour             
    

    with output

    enter image description here