Search code examples
mysqlsqlgaps-and-islands

SQL QUERY Merge Consecutive Same Values After Sorting the Table


Let say we have this table

id |begin | end | location
1  |    5 |  10 | MALL A
2  |    1 |   3 | MALL B
3  |   13 |  17 | MALL A
4  |   21 |  25 | MALL C
5  |   36 |  38 | MALL D
6  |   31 |  33 | MALL D
7  |   26 |  29 | MALL F
8  |   40 |  45 | MALL D

Then we sort the table by the begin column asc. Therefore we have this table

id |begin | end | location
2  |    1 |   3 | MALL B
1  |    5 |  10 | MALL A
3  |   13 |  17 | MALL A
4  |   21 |  25 | MALL C
7  |   26 |  29 | MALL F
6  |   31 |  33 | MALL D
5  |   36 |  38 | MALL D
8  |   40 |  45 | MALL D

I would like to get a table like this. (Rows that has the same location in consecutive will be merged)

begin | end | location
    1 |   3 | MALL B
    5 |  17 | MALL A
   21 |  25 | MALL C
   26 |  29 | MALL F
   31 |  45 | MALL D

How do I achieve that?

I thought that I can use RANK() then group it by the rank value. But I can't make it. I thought this is because the table hasn't been sorted first.

If you want to make the table on SQL I provide these SQL syntax for creating it.

CREATE TABLE `t` (
  `id` int NOT NULL,
  `begin` int DEFAULT NULL,
  `end` int DEFAULT NULL,
  `location` varchar(45) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

INSERT INTO t (id, begin, end, location) VALUES (1, 5,10, 'A');
INSERT INTO t (id, begin, end, location) VALUES (2, 1,3, 'B');
INSERT INTO t (id, begin, end, location) VALUES (3, 13,17, 'A');
INSERT INTO t (id, begin, end, location) VALUES (4, 21,25, 'C');
INSERT INTO t (id, begin, end, location) VALUES (5, 36,38, 'D');
INSERT INTO t (id, begin, end, location) VALUES (6, 31,33, 'D');
INSERT INTO t (id, begin, end, location) VALUES (7, 26,29, 'F');
INSERT INTO t (id, begin, end, location) VALUES (8, 40,45, 'D');

Solution

  • This is a type of gaps and islands problem. In this case, you can use lag() to identify where rows should be in separate groups. Then use a cumulative sum to define the groups and aggregate:

    select location, min(begin), max(end)
    from (select t.*,
                 sum(case when prev_location = location then 0 else 1 end) over (order by begin) as grp
          from (select t.*,
                       lag(location) over (order by begin) as prev_location
                from t
               ) t
         ) t
    group by grp, location;
    

    Actually, because you don't care about gaps between ends and the following begins, you can use the moderately simpler difference of row number:

    select location, min(begin), max(end)
    from (select t.*,
                 row_number() over (order by begin) as seqnum,
                 row_number() over (partition by location order by begin) as seqnum_2
          from t
         ) t
    group by location, (seqnum - seqnum_2);
    

    This is a little harder to explain, but if you look at the results of the subquery, you will see how the difference between the two row_number()s is constant when the location is the same.