mysql sql mariadb sql-order-by greatest-n-per-group

Why does this ORDER BY in sub-query workaround not work consistently?

In order to get the most recent record of a certain combination of identifiers, I use the following query:

SELECT t1.*
FROM (
    SELECT id, b_id, c_id
    FROM a
    ORDER BY epoch DESC
    LIMIT 18446744073709551615
) AS t1
GROUP BY t1.b_id, t1.c_id

If there are multiple records of a combination of b_id + c_id , then it will always select the one with the highest value of epoch (and as such, the latest in time).

The LIMIT is added as a workaround to force MariaDB to actually order the results. I successfully use this construction a lot in my application, and so have others.

However, now I came across an exact same query in my application, where I "accidentally" used more columns than strictly necessary in the sub-query:

SELECT t1.*
FROM (
    SELECT id, b_id, c_id, and, some, other, columns, ...
    FROM a
    ORDER BY epoch DESC
    LIMIT 18446744073709551615
) AS t1
GROUP BY t1.b_id, t1.c_id

I've tested both queries. And the exact same query, but with as only change those additional columns, makes the result to become incorrect. In fact, the number of columns determines the result. If I have <= 28 columns, the result is okay. If I have 29 columns, then it gives the third-latest record (which is wrong too), and if I have 30-36 columns it always gives the second-latest record (36 is the total number for table a). In my testing, it didn't seem to matter which particular column was removed or added.

I'm having a hard time finding out why exactly the behavior changes after I add more columns. Also, perhaps by chance, it still gave the correct result yesterday. But today suddenly the result changed, probably after new records (with unrelated identifiers) were added to table a. I've tried using EXPLAIN:

# The first query, with columns: id, b_id, c_id
 id     select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    280     Using where; Using temporary; Using filesort
2   DERIVED     a   ALL     NULL    NULL    NULL    NULL    280     Using filesort

# The second query, with columns: id, b_id, c_id, and, some, other, columns, ...
 id     select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    276     Using where; Using temporary; Using filesort
2   DERIVED     a   ALL     NULL    NULL    NULL    NULL    276     Using filesort

But that doesn't really help me much, other than that I can see that the key_len is different. The second-latest record that is incorrectly received in the second query is one where id = 276, the actual latest record that it correctly retrieves using the first query is one where id = 278. In total there are 307 rows now, and yesterday perhaps just ~300. I'm not sure how to interpret these results to understand what is going wrong. Does anyone know? And if not, what else can I do to find out what is causing these strange results?

Solution

This is a malformed query and should be generating a syntax error:

SELECT t1.*
FROM (SELECT id, b_id, c_id
      FROM a
      ORDER BY epoch DESC
      LIMIT 18446744073709551615
     ) t1
GROUP BY t1.b_id, t1.c_id;

Why? You are selecting 3 columns with no aggregation functions. But the group by only has two columns. Happily, this is now a syntax error in MySQL, using the default settings. Finally! (MySQL accepted this non-standard syntax until version 8.0.)

You can do what you want using a correlated subquery:

select a.*
from a
where a.epoch = (select max(a2.epoch)
                 from a a2
                 where a2.b_id = a.b_id and a2.c_id = a.c_id
                );

With an index on a(b_id, c_id, epoch), this is probably also faster than aggregation -- even if that happened to work under some circumstances.