Search code examples
mysqlsqlgroupwise-maximum

MySQL 8 DISTINCT with greatest ID


Hopefully this is an easy problem, and I'm just thinking about it wrong.

I have an association between id_1 and id_2. There is a 1->n relationship between id_1 and id_2. Namely, if no id_2 exists for id_1 it has a default value of -1. This can look like:

id_1   id_2   info
---- | ---- | ----
120   -1      'dont'
120   444     'show'
123   -1      'test'
124   -1      'hello'
125   -1      'world'
125   123     'oh wait'
126   -1      'help'
126   201     'me'
127   -1      'sql'
127   206     'hell'

Basically I want to have a SQL query that gets the highest id_2 associated with an id_1, given an IN statement matching criteria from id_1. If I can't have an IN statement that's also fine as well, I'm just trying to optimize something else.

Here is what the answer should look like given id_1 containing (123,124,125,126,127) and id_2 containing (-1,-1,123,201,206):

id_1   id_2   info
---- | ---- | ----
123    -1     'test'
124    -1     'hello'
125    123    'oh wait'
126    201    'me'
127    206    'hell'

Notice how id_1 120 ends up excluded. I've tried:

SELECT DISTINCT id_1, id_2, info
FROM ids
WHERE
      id_1 IN (123, 124, 125, 126, 127) AND
      id_2 IN (-1, -1, 123, 201, 206)

But this gives multiple id_2 values for a single id_1.

I've also tried a GROUP BY on id_2 but that only gives me one id_1 that has -1.

SELECT id_1, id_2, info
FROM ids
WHERE
      id_1 IN (123, 124, 125, 126, 127) AND
      id_2 IN (-1, -1, 123, 201, 206)
GROUP BY id_2

results in:

id_1   id_2    info
----- | ---- | ----
123    -1     'test'
125    123    'oh wait'
126    201    'me'
127    206    'hell'

Note the missing id_1 of 124.

So how can I write this query to get the most recent, distinct id_2 for an id_1?


Solution

  • You may user ROW_NUMBER if on MySQL 8+:

    WITH cte AS (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY id_1 ORDER BY id_2 DESC) rn
        FROM ids
        WHERE id_1 IN (123, 124, 125, 126, 127) AND
              id_2 IN (-1, -1, 123, 201, 206)
    )
    
    SELECT id_1, id_2, info
    FROM cte
    WHERE rn = 1;
    

    On earlier versions of MySQL, you can identify the max record per id_1 group using exists logic:

    SELECT i1.*
    FROM ids i1
    WHERE id_1 IN (123, 124, 125, 126, 127) AND
          id_2 IN (-1, -1, 123, 201, 206) AND
          NOT EXISTS (
              SELECT 1
              FROM ids i2
              WHERE i2.id_1 = i1.id_1 AND
                    i2.id_2 > i1.id_2 AND
                    i2.id_1 IN (123, 124, 125, 126, 127) AND
                    i2.id_2 IN (-1, -1, 123, 201, 206)
          );