Search code examples
sqlselectgroup-bydb2greatest-n-per-group

SQL: Greatest row with variable group / condition


I have this simplified table documenting some job runs:

+----+-----------------------+--------+
| id |          pit          |  stat  |
+----+-----------------------+--------+
|  1 | '2018-09-21 12:00:00' | 'TEST' |
|  1 | '2018-09-20 12:00:00' | 'TEST' |
|  1 | '2018-09-19 12:00:00' | 'PROD' |
|  1 | '2018-09-18 12:00:00' | 'PROD' |
|  1 | '2018-09-17 12:00:00' | 'TEST' |
|    |                       |        |
|  2 | '2018-09-19 14:00:00' | 'TEST' |
|  2 | '2018-09-18 14:00:00' | 'TEST' |
|    |                       |        |
|  3 | '2018-09-19 15:00:00' | 'PROD' |
|  3 | '2018-09-18 15:00:00' | 'PROD' |
|  3 | '2018-09-17 15:00:00' | 'TEST' |
|    |                       |        |
|  4 | '2018-09-21 12:00:00' | 'PROD' |
|  4 | '2018-09-20 12:00:00' | 'PROD' |
+----+-----------------------+--------+

Now i want to do the following:

For every id: Find the latest job-run. If the latest run has stat = 'PROD' then only return that row. If the latest run has stat = 'TEST' then return that row and additionally return the latest run with stat = 'PROD'.

Currently im running this SQL to get the most recent PROD-run as well as the most recent TEST-run for every id:

SELECT t.*
FROM SOME_TABLE t
INNER JOIN (
    SELECT id, MAX(pit) pit, stat
    FROM SOME_TABLE
    GROUP BY id, stat
) mt ON t.id = mt.id AND t.pit = mt.pit
ORDER BY id asc, pit desc;

SQL Fiddle

Now what i'm missing is the part, that i want to filter out the runs with stat = 'TEST' if there is a more recent run with stat = 'PROD'.

Is there any good way of accomplishing this within the SQL?


Solution

  • Sometimes, rephrasing the requirement can help. Here, for each id, you want to return the most recent row, and the most recent "prod" row, which may or may not be the same row. An alternative way of looking at the problem is to sort the rows in two ways, and take the first row from each category:

    1. In descending order by date (to get the most recent one), per ID
    2. In ascending order by type (to get "prod" rows before "test" rows) and then in descending order by date, per ID:


    SELECT id, pit, stat
    FROM   (SELECT id, pit, stat
                   ROW_NUMBER() OVER (PARTITION BY id ORDER BY pit DESC) AS rn,
                   ROW_NUMBER() OVER (PARTITION BY id ORDER BY stat ASC, pit DESC) AS prn
            FROM   some_table) t
    WHERE  1 IN (rn, prn)