I have to solve this problem within bigQuery. I have this column in my table:
event | time
_________________|____________________
start | 1
end | 2
random_event_X | 3
start | 4
error_X | 5
error_Y | 6
end | 7
start | 8
error_F | 9
start | 10
random_event_Y | 11
error_z | 12
end | 13
I would like to, from the end
event record everything until start
appear and then count it. Everything can happen between start and end and outside of it. If there is an end, there is a start, but if there is a start, there is not necessarily an end.
The desire output would be like:
string_agg | count
"start, end" | 1
"start, error_X, error_Y, end" | 1
"start, random_event_Y error_Z, end" | 1
So everything between each start and end if start
has an end
. So without the random_event_X
at time 3, the start
at time 8 or the error_F
at time 9.
I was not able to find the solution and have struggle understanding how to approach this problem. Any help or advice is welcome.
Below is for BigQuery Standard SQL
#standardSQL
SELECT agg_events, COUNT(1) cnt
FROM (
SELECT STRING_AGG(event ORDER BY time) agg_events, COUNTIF(event IN ('start', 'end')) flag
FROM (
SELECT *, COUNTIF(event = 'start') OVER(PARTITION BY grp1 ORDER BY time) grp2
FROM (
SELECT *, COUNTIF(event = 'end') OVER(ORDER BY time DESC) grp1
FROM `project.dataset.table`
)
)
GROUP BY grp1, grp2
)
WHERE flag = 2
GROUP BY agg_events
If to apply to sample data from your question - result is
Row agg_events cnt
1 start,random_event_Y,error_z,end 1
2 start,error_X,error_Y,end 1
3 start,end 1