sql window-functions correlated-subquery

SQL select all rows per group after a condition is met

I would like to select all rows for each group after the last time a condition is met for that group. This related question has an answer using correlated subqueries.

In my case I will have millions of categories and hundreds of millions/billions of rows. Is there a way to achieve the same results using a more performant query?

Here is an example. The condition is all rows (per group) after the last 0 in the conditional column.

category | timestamp |  condition 
--------------------------------------
   A     |     1     |     0 
   A     |     2     |     1 
   A     |     3     |     0 
   A     |     4     |     1
   A     |     5     |     1
   B     |     1     |     0 
   B     |     2     |     1
   B     |     3     |     1

The result I would like to achieve is

category | timestamp |  condition 
--------------------------------------
   A     |     4     |     1
   A     |     5     |     1
   B     |     2     |     1
   B     |     3     |     1

Solution

You might want to try window functions:

select category, timestamp, condition
from (
    select 
        t.*,
        min(condition) over(partition by category order by timestamp desc) min_cond
    from mytable t
) t
where min_cond = 1

The window min() with the order by clause computes the minimum value of condition over the current and following rows of the same category: we can use it as a filter to eliminate rows for which there is a more recent row with a 0.

Compared to the correlated subquery approach, the upside of using window functions is that it reduces the number of scans needed on the table. Of course this computing also has a cost, so you'll need to assess both solutions against your sample data.