The example table:
id | name | create_time | group_id |
---|---|---|---|
1 | a | 2022-01-01 12:00:00 | group1 |
2 | b | 2022-01-01 13:00:00 | group1 |
3 | c | 2022-01-01 12:00:00 | NULL |
4 | d | 2022-01-01 13:00:00 | NULL |
5 | e | NULL | group2 |
I need to get top 1 rows (with the minimal create_time
) grouped by group_id
with these conditions:
create_time
can be null - it should be treated as a minimal valuegroup_id
can be null - all rows with nullable group_id
should be returned (if it's not possible, we can use coalesce(group_id, id)
or sth like that assuming that ids are unique and never collide with group ids)The expected output for the example:
id | name | create_time | group_id |
---|---|---|---|
1 | a | 2022-01-01 12:00:00 | group1 |
3 | c | 2022-01-01 12:00:00 | NULL |
4 | d | 2022-01-01 13:00:00 | NULL |
5 | e | NULL | group2 |
I've already read similar questions on SO but 90% of answers are with specific keywords (numerous answers with PARTITION BY
like https://stackoverflow.com/a/6841644/5572007) and others don't honor null values in the group condition columns and probably pagination (like https://stackoverflow.com/a/14346780/5572007).
You can combine two queries with UNION ALL
. E.g.:
select id, name, create_time, group_id
from mytable
where group_id is not null
and not exists
(
select null
from mytable older
where older.group_id = mytable.group_id
and older.create_time < mytable.create_time
)
union all
select id, name, create_time, group_id
from mytable
where group_id is null
order by id;
This is standard SQL and very basic at that. It should work in about every RDBMS.
As to pagination: This is usually costly, as you run the same query again and again in order to always pick the "next" part of the result, instead of running the query only once. The best approach is usually to use the primary key to get to the next part so an index on the key can be used. In above query we'd ideally add where id > :last_biggest_id
to the queries and limit the result, which would be fetch next <n> rows only
in standard SQL. Everytime we run the query, we use the last read ID as :last_biggest_id
, so we read on from there.
Variables, however, are dealt with differently in the various DBMS; most commonly they are preceded by either a colon, a dollar sign or an at sign. And the standard fetch clause, too, is supported by only some DBMS, while others have a LIMIT
or TOP
clause instead.
If these little differences make it impossible to apply them, then you must find a workaround. For the variable this can be a one-row-table holding the last read maximum ID. For the fetch clause this can mean you simply fetch as many rows as you need and stop there. Of course this isn't ideal, as the DBMS doesn't know then that you only need the next n rows and cannot optimize the execution plan accordingly.
And then there is the option not to do the pagination in the DBMS, but read the complete result into your app and handle pagination there (which then becomes a mere display thing and allocates a lot of memory of course).