Search code examples
sqlgoogle-bigquerystring-aggregation

sql, big query: aggregate all entries between two strings in a variable


I have to solve this problem within bigQuery. I have this column in my table:

event            | time
_________________|____________________
start            | 1
end              | 2
random_event_X   | 3
start            | 4 
error_X          | 5 
error_Y          | 6
end              | 7
start            | 8
error_F          | 9
start            | 10
random_event_Y   | 11
error_z          | 12
end              | 13

I would like to, from the end event record everything until start appear and then count it. Everything can happen between start and end and outside of it. If there is an end, there is a start, but if there is a start, there is not necessarily an end.

The desire output would be like:

string_agg                            | count
"start, end"                          |  1
"start, error_X, error_Y, end"        |  1
"start, random_event_Y error_Z, end"  |  1

So everything between each start and end if start has an end. So without the random_event_X at time 3, the start at time 8 or the error_F at time 9.

I was not able to find the solution and have struggle understanding how to approach this problem. Any help or advice is welcome.


Solution

  • Below is for BigQuery Standard SQL

    #standardSQL
    SELECT agg_events, COUNT(1) cnt 
    FROM (
      SELECT STRING_AGG(event ORDER BY time) agg_events, COUNTIF(event IN ('start', 'end')) flag   
      FROM (
        SELECT *, COUNTIF(event = 'start') OVER(PARTITION BY grp1 ORDER BY time) grp2     
        FROM (
          SELECT *, COUNTIF(event = 'end') OVER(ORDER BY time DESC) grp1 
          FROM `project.dataset.table`
        )
      )
      GROUP BY grp1, grp2
    )
    WHERE flag = 2
    GROUP BY agg_events   
    

    If to apply to sample data from your question - result is

    Row agg_events                          cnt  
    1   start,random_event_Y,error_z,end    1    
    2   start,error_X,error_Y,end           1    
    3   start,end                           1