Search code examples
sqloraclegreatest-n-per-group

SQL SELECT rows with MAX value on a column and returns all columns


Ok so I have this table :

+----+--------------+------------------+----------+
| id | business_key | other columns... | creation |
+----+--------------+------------------+----------+
|  1 |            1 |              ... | 01/01/14 |
|  2 |            1 |              ... | 12/02/14 |
|  3 |            1 |              ... | 13/03/14 | <--
|  4 |            2 |              ... | 01/01/14 |
|  5 |            2 |              ... | 12/02/14 | <--
|  6 |            8 |              ... | 01/01/14 | <--
|  7 |           10 |              ... | 01/01/14 |
|  8 |           10 |              ... | 12/02/14 |
|  9 |           10 |              ... | 13/03/14 |
| 10 |           10 |              ... | 13/03/14 | <--
+----+--------------+------------------+----------+

For each business key, I want to return the most recent row and for that I have the "creation" column (see the arrows above). The simple answer would be :

SELECT business_key, MAX(creation) FROM mytable GROUP BY business_key;

The thing is, I need to return ALL the columns. Then I learned the existence of the tag on StackOverflow and I found this topic : SQL Select only rows with Max Value on a Column. The best answer is great and provides this request :

SELECT mt1.*
FROM mytable mt1
LEFT OUTER JOIN mytable mt2
ON (mt1.business_key = mt2.business_key AND mt1.creation < mt2.creation)
WHERE mt2.business_key IS NULL;

Sadly it doesn't work because my situation is a little trickier : if you look at the line 9 and 10 of my table, you will see that they have the same business key and the same creation date. While this should be avoided in my application, I still have to handle it if it happens.

With the last request above, this is what I will get :

+----+--------------+------------------+----------+
| id | business_key | other columns... | creation |
+----+--------------+------------------+----------+
|  3 |            1 |              ... | 13/03/14 |
|  5 |            2 |              ... | 12/02/14 |
|  6 |            8 |              ... | 01/01/14 |
|  9 |           10 |              ... | 13/03/14 | <--
| 10 |           10 |              ... | 13/03/14 | <--
+----+--------------+------------------+----------+

While I wanted this :

+----+--------------+------------------+----------+
| id | business_key | other columns... | creation |
+----+--------------+------------------+----------+
|  3 |            1 |              ... | 13/03/14 |
|  5 |            2 |              ... | 12/02/14 |
|  6 |            8 |              ... | 01/01/14 |
| 10 |           10 |              ... | 13/03/14 | <--
+----+--------------+------------------+----------+

I know it's a poor choice to want a MAX() on a technical column like "id", but right now it's the only way for me to prevent duplicates when the business key AND the creation date are the same. The problem is, I have no idea how to do it. Any idea ? Keep in mind it must return all the columns (and we have a lot of columns so a SELECT * will be necessary).

Thanks a lot.


Solution

  • The first thought is that your id seems to increment along with the date, so just use that:

    SELECT mt1.*
    FROM mytable mt1 LEFT OUTER JOIN
         mytable mt2
         ON mt1.business_key = mt2.business_key AND mt2.id > mt1.id
    WHERE mt2.business_key IS NULL;
    

    You can still do the same idea with two columns:

    SELECT mt1.*
    FROM mytable mt1 LEFT OUTER JOIN
         mytable mt2
         ON mt1.business_key = mt2.business_key AND
            (mt2.creation > mt1.creation OR
             mt2.creation = mt1.creation AND
             mt2.id > mt1.id
            )
    WHERE mt2.business_key IS NULL;