Search code examples
sqlsqlitecommon-table-expressionwindow-functionsgroup-concat

How to join two or more equal consecutive values of column into one in SQLite?


I have this SQL query:



SELECT 
    strftime('%Y-%m-%d', start_time) as day, group_concat(event_id, ' | ') as events
FROM
    events_table
WHERE 
    start_time BETWEEN '2022-01-01 00:00:00' and '2022-03-31 23:59:59'
    and event_id is not null
GROUP by day

This returns:


|-------------------------------------------------------------------------|
| day        |   events                                                   |
|-------------------------------------------------------------------------|
| 1999-01-04 |  event_1 | event_2 | event_3 | event_4                     |
| 1999-01-05 |  event_1 | event_1 | event_2                               |
| 1999-01-07 |  event_1 | event_2 | event_2 | event_3 | event_2 | event_3 |
|-------------------------------------------------------------------------|

Inside day events are ordered by start time. But I want to get:


|-------------------------------------------------------------------------|
| day        |   events                                                   |
|-------------------------------------------------------------------------|
| 1999-01-04 |  event_1 | event_2 | event_3 | event_4                     |
| 1999-01-05 |  event_1 | event_2                                         |
| 1999-01-07 |  event_1 | event_2 | event_3 | event_2 | event_3           |
|-------------------------------------------------------------------------|


As You can see, if consecutive values of column equal they should be joined into one, values delimited by | sign

In other words if two or more consecutive values of column are same they should be merged into one, so avoid repetition


Solution

  • SQLite's GROUP_CONCAT() aggregate function does not support an ORDER BY clause and this means that even if you do get results in your expected order this order is not guaranteed.

    Instead you should use GROUP_CONCAT() window function.

    First, you must filter out the rows with empty event_ids and use LAG() window function to check if the previous event_id of the same date of each row (if it exists) is the same as the current one, so that it should be also filtered out:

    WITH cte AS (
      SELECT *, event_id IS NOT LAG(event_id) OVER (PARTITION BY date(start_time) ORDER BY start_time) flag
      FROM events_table
      WHERE event_id <> ''
    )
    SELECT DISTINCT date(start_time) AS day, 
           GROUP_CONCAT(event_id, ' | ') OVER (
             PARTITION BY date(start_time) 
             ORDER BY start_time 
             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
           ) AS events
    FROM cte
    WHERE flag AND start_time BETWEEN '1999-01-01 00:00:00' AND '1999-03-31 23:59:59';
    

    See the demo.