Search code examples
mysqlsqlwindow-functions

String Concat over partition in SQL?


I want to concatenate the surrounding rows(in the following examples only the surrounding 2 rows) after ranking to a new column(group by seems to not work), and here is the data I have:

Schema (MySQL v8.0)

CREATE TABLE log_table (
  `user_id` VARCHAR(5),
  `date_time` DATETIME,
  `event_name` VARCHAR(10),
  `trivial` int
);

INSERT INTO log_table
  (`user_id`, `date_time`, `event_name`, `trivial`)
VALUES
  ('001', '2020-12-10 10:00:02', 'c', 3),
  ('001', '2020-12-10 10:00:01', 'b', 9),
  ('001', '2020-12-10 10:00:40', 'e', 2),
  ('001', '2020-12-10 10:00:20', 'd', 6),
  ('001', '2020-12-10 10:00:00', 'a', 1),
  ('002', '2020-12-10 10:00:10', 'C', 9),
  ('002', '2020-12-10 10:00:50', 'D', 0),
  ('002', '2020-12-10 10:00:02', 'A', 2),
  ('002', '2020-12-10 10:00:09', 'B', 4);

To illustrate what I want to do. I can do summing over numerical values using the sum clause as follows:
Query #1

SELECT *,
       SUM(trivial)
         over(
           PARTITION BY user_id
           ORDER BY user_id, date_time ROWS BETWEEN 2 preceding AND 2 following)
       AS
       trivial_new
FROM   log_table; 
user_id date_time event_name trivial trivial_new
001 2020-12-10 10:00:00 a 1 13
001 2020-12-10 10:00:01 b 9 19
001 2020-12-10 10:00:02 c 3 21
001 2020-12-10 10:00:20 d 6 20
001 2020-12-10 10:00:40 e 2 11
002 2020-12-10 10:00:02 A 2 15
002 2020-12-10 10:00:09 B 4 15
002 2020-12-10 10:00:10 C 9 15
002 2020-12-10 10:00:50 D 0 13

View on DB Fiddle

For the string field event_name, I tried this snippet:
Query #2

SELECT *,
       Concat(event_name)
         over(
           PARTITION BY user_id
           ORDER BY user_id, date_time ROWS BETWEEN 2 preceding AND 2 following)
       AS
       event_name_new
FROM   log_table

And here is my expected results:

user_id date_time event_name trivial event_name_new
001 2020-12-10 10:00:00 a 1 abc
001 2020-12-10 10:00:01 b 9 abcd
001 2020-12-10 10:00:02 c 3 abcde
001 2020-12-10 10:00:20 d 6 bcde
001 2020-12-10 10:00:40 e 2 cde
002 2020-12-10 10:00:02 A 2 ABC
002 2020-12-10 10:00:09 B 4 ABCD
002 2020-12-10 10:00:10 C 9 ABCD
002 2020-12-10 10:00:50 D 0 BCD

But the Query #2 cannot get me here, and I have googled but all I can find is about group by(refer to this and this and this).

I know I can work around the problem by using LAG and LEAD(for the following rows) but I need to concatenate the new columns and when I need to concatenate many rows I need to do lots of manual work like concatenate them by separators like , and etc.

Can I do that in one step without using LAG and LEAD?


Solution

  • A correlated subquery might be the simplest solution:

    with l as (
          select l.*,
                 cast(row_number() over (partition by user_id order by date_time) as signed) as seqnum
          from log_table l
         )
    select l.*,
           (select group_concat(l2.event_name order by l2.date_time separator '')
            from l l2
            where l2.user_id = l.user_id and
                  l2.seqnum between l.seqnum - 2 and l.seqnum + 2
           ) as new_event_name
    from l;
    

    If the event names are one character, you can eliminate the correlated subquery and use string operations:

    with l as (
          select l.*, full_concat,
                 cast(row_number() over (partition by user_id order by date_time) as signed) as seqnum
          from log_table l join
               (select user_id,  group_concat(event_name order by date_time separator '') as full_concat
                from log_table l
                group by user_id
               ) ll
               using (user_id)
         )
    select l.*, substr(full_concat, greatest(seqnum - 2, 1), least(5, seqnum + 2))
    from l;
    

    Here is a db<>fiddle.