I want to concatenate the surrounding rows(in the following examples only the surrounding 2 rows) after ranking to a new column(group by
seems to not work), and here is the data I have:
Schema (MySQL v8.0)
CREATE TABLE log_table (
`user_id` VARCHAR(5),
`date_time` DATETIME,
`event_name` VARCHAR(10),
`trivial` int
);
INSERT INTO log_table
(`user_id`, `date_time`, `event_name`, `trivial`)
VALUES
('001', '2020-12-10 10:00:02', 'c', 3),
('001', '2020-12-10 10:00:01', 'b', 9),
('001', '2020-12-10 10:00:40', 'e', 2),
('001', '2020-12-10 10:00:20', 'd', 6),
('001', '2020-12-10 10:00:00', 'a', 1),
('002', '2020-12-10 10:00:10', 'C', 9),
('002', '2020-12-10 10:00:50', 'D', 0),
('002', '2020-12-10 10:00:02', 'A', 2),
('002', '2020-12-10 10:00:09', 'B', 4);
To illustrate what I want to do. I can do summing over numerical values using the sum
clause as follows:
Query #1
SELECT *,
SUM(trivial)
over(
PARTITION BY user_id
ORDER BY user_id, date_time ROWS BETWEEN 2 preceding AND 2 following)
AS
trivial_new
FROM log_table;
user_id | date_time | event_name | trivial | trivial_new |
---|---|---|---|---|
001 | 2020-12-10 10:00:00 | a | 1 | 13 |
001 | 2020-12-10 10:00:01 | b | 9 | 19 |
001 | 2020-12-10 10:00:02 | c | 3 | 21 |
001 | 2020-12-10 10:00:20 | d | 6 | 20 |
001 | 2020-12-10 10:00:40 | e | 2 | 11 |
002 | 2020-12-10 10:00:02 | A | 2 | 15 |
002 | 2020-12-10 10:00:09 | B | 4 | 15 |
002 | 2020-12-10 10:00:10 | C | 9 | 15 |
002 | 2020-12-10 10:00:50 | D | 0 | 13 |
For the string field event_name
, I tried this snippet:
Query #2
SELECT *,
Concat(event_name)
over(
PARTITION BY user_id
ORDER BY user_id, date_time ROWS BETWEEN 2 preceding AND 2 following)
AS
event_name_new
FROM log_table
And here is my expected results:
user_id | date_time | event_name | trivial | event_name_new |
---|---|---|---|---|
001 | 2020-12-10 10:00:00 | a | 1 | abc |
001 | 2020-12-10 10:00:01 | b | 9 | abcd |
001 | 2020-12-10 10:00:02 | c | 3 | abcde |
001 | 2020-12-10 10:00:20 | d | 6 | bcde |
001 | 2020-12-10 10:00:40 | e | 2 | cde |
002 | 2020-12-10 10:00:02 | A | 2 | ABC |
002 | 2020-12-10 10:00:09 | B | 4 | ABCD |
002 | 2020-12-10 10:00:10 | C | 9 | ABCD |
002 | 2020-12-10 10:00:50 | D | 0 | BCD |
But the Query #2 cannot get me here, and I have googled but all I can find is about group by
(refer to this and this and this).
I know I can work around the problem by using LAG
and LEAD
(for the following rows) but I need to concatenate the new columns and when I need to concatenate many rows I need to do lots of manual work like concatenate them by separators like ,
and etc.
Can I do that in one step without using LAG
and LEAD
?
A correlated subquery might be the simplest solution:
with l as (
select l.*,
cast(row_number() over (partition by user_id order by date_time) as signed) as seqnum
from log_table l
)
select l.*,
(select group_concat(l2.event_name order by l2.date_time separator '')
from l l2
where l2.user_id = l.user_id and
l2.seqnum between l.seqnum - 2 and l.seqnum + 2
) as new_event_name
from l;
If the event names are one character, you can eliminate the correlated subquery and use string operations:
with l as (
select l.*, full_concat,
cast(row_number() over (partition by user_id order by date_time) as signed) as seqnum
from log_table l join
(select user_id, group_concat(event_name order by date_time separator '') as full_concat
from log_table l
group by user_id
) ll
using (user_id)
)
select l.*, substr(full_concat, greatest(seqnum - 2, 1), least(5, seqnum + 2))
from l;
Here is a db<>fiddle.