Search code examples
mariadbgaps-and-islands

SQL - Condense rows with consecutive IDs


I have a mariadb 10.5 InnoDB table with lots of data. Each table row has an id like this:

ext_id    <other data>
--
1
2
4
5
6
8
9

As you can see, the ids aren't all consecutive as they stem from an external data source and the database is partially incomplete. However, many of them are consecutive.

Because there are thousands of ids, this kind of representation with one id per row is unsuitable for getting a quick overview on which ids are present and which are not.

I would like to do a SQL request which instead lists the ranges of consecutive ids like this:

id
--
1-2
4-6
8-9

or, alternatively:

id_start    id_end
--------    ------
1           2
4           6
8           9

How can I achieve this using SQL?


Solution

  • There are likely several solutions, one of them is to use the WINDOW function DENSE_RANK().

    Sample table:

    As an example, we have a table t1(id int) with some gaps:

    CREATE TABLE t1 (id int);
    INSERT INTO t1 SELECT seq FROM seq_1_to_4;
    INSERT INTO t1 SELECT seq FROM seq_7_to_13;
    INSERT INTO t1 SELECT seq FROM seq_22_to_25;
    

    DENSE_RANK() as group identifier

    By subtracting the value returned by DENSE_RANK from the ID, we get a group identifier.

    SELECT id, id - dense_rank() over (order by id) AS result FROM t1;
    +------+--------+
    | id   | result |
    +------+--------+
    |    1 |      0 |
    |    2 |      0 |
    |    3 |      0 |
    |    4 |      0 |
    |    7 |      2 |
    |    8 |      2 |
    |    9 |      2 |
    |   10 |      2 |
    |   11 |      2 |
    |   12 |      2 |
    |   13 |      2 |
    |   22 |     10 |
    |   23 |     10 |
    |   24 |     10 |
    |   25 |     10 |
    +------+--------+
    

    Subquery:

    Now use previous query as subquery and group by result.

    SELECT min(sequences.id), max(sequences.id) from (select id, id - dense_rank() over (order by id) as result from t1) as sequences group by result;
    +-------------------+-------------------+
    | min(sequences.id) | max(sequences.id) |
    +-------------------+-------------------+
    |                 1 |                 4 |
    |                 7 |                13 |
    |                22 |                25 |
    +-------------------+-------------------+