Search code examples
postgresqlcountwhere-clausewindow-functions

Get count of values in different subgroups


I need to delete some rows in the dataset, of which the speed equals zero and lasting over N times (let's assume N is 2). The structure of the table demo looks like:

id car speed time
1 foo 0 1
2 foo 0 2
3 foo 0 3
4 foo 1 4
5 foo 1 5
6 foo 0 6
7 bar 0 1
8 bar 0 2
9 bar 5 3
10 bar 5 4
11 bar 5 5
12 bar 5 6

Then I hope to generate a table like the one below by using window_function:

id car speed time lasting
1 foo 0 1 3
2 foo 0 2 3
3 foo 0 3 3
4 foo 1 4 2
5 foo 1 5 2
6 foo 0 6 1
7 bar 0 1 2
8 bar 0 2 2
9 bar 5 3 4
10 bar 5 4 4
11 bar 5 5 4
12 bar 5 6 4

Then I can easily exclude those rows by using WHERE NOT (speed = 0 AND lasting > 2)

Put the code I tried here, but it didn't return the value I expected and I guess those FROM (SELECT ... FROM (SELECT ... might not be the best practice to solve the problem:

SELECT g3.*, count(id) OVER (PARTITION BY car, cumsum ORDER BY id) as num   
  FROM (SELECT g2.*, sum(grp2) OVER (PARTITION BY car ORDER BY id) AS cumsum             
    FROM (SELECT g1.*, (CASE ne0 WHEN 0 THEN 0 ELSE 1 END) AS grp2                            
      FROM (SELECT g.*, speed - lag(speed, 1, 0) OVER (PARTITION BY car) AS ne0              
        FROM (SELECT *, row_number() OVER (PARTITION BY car) AS grp FROM demo) g ) g1 ) g2 ) g3                                                                                       
ORDER BY id;

Solution

  • You can use window function LAG() to check for the previous speed value for each row and SUM() window function to create the groups for the continuous values.
    Then with COUNT() window function you can count the number of rows in each group so that you can filter out the rows with 0 speed in the groups that have more than 2 rows:

    SELECT id, car, speed, time
    FROM (
      SELECT *, COUNT(*) OVER (PARTITION BY car, grp) counter
      FROM (
        SELECT *, SUM(flag::int) OVER (PARTITION BY car ORDER BY time) grp
        FROM (
          SELECT *, speed <> LAG(speed, 1, speed - 1) OVER (PARTITION BY car ORDER BY time) flag
          FROM demo
        ) t  
      ) t
    ) t
    WHERE speed <> 0 OR counter <= 2
    ORDER BY id;
    

    See the demo.