Search code examples
sqlpostgresqlselectgreatest-n-per-groupwindow-functions

More efficient way to SELECT rows from PARTITION BY


Suppose I have the following table:

+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
|  1 |           1 |           3 |
|  1 |           2 |           3 |
|  1 |           3 |           4 |
|  2 |           2 |           3 |
|  2 |           3 |           4 |
|  2 |           4 |           5 |
+----+-------------+-------------+

My desired results are:

+----+-------------+-------------+
| id | step_number | employee_id |
+----+-------------+-------------+
|  1 |           1 |           3 |
|  2 |           2 |           3 |
+----+-------------+-------------+

My current solution is:

SELECT
    *
FROM
(SELECT
    id,
    step_number,
    MIN(step_number) OVER (PARTITION BY id) AS min_step_number,
    employee_id
FROM
    table_name) AS t
WHERE
    t.step_number = t.min_step_number

Is there a more efficient way I could be doing this?

I'm currently using postgresql, version 12.


Solution

  • In Postgres, I would recommend using distinct on to adress this greatest-n-per-group problem:

    select distinct on (id) t.*
    from mytbale t
    order by id, step_number
    

    This Postgres extension to the SQL standard has usually better performance than the standard approach using window functions (and, as a bonus, the syntax is neater).

    Note that this assumes unicity of (id, step_number) tuples: otherwise, the results might be different than those of your query (which allows ties, while distinct on does not).