How to create an ordered rank/group based on consecutive values?

I have the following table schema:

id	text	persons	timestamp
001	hi	p1	2022-08-26 21:03:16.697000
001	how are you doing?	p1	2022-08-26 21:03:17.528000
001	i'm doing fine	p2	2022-08-26 21:03:18.234000
001	hey guys!	p3	2022-08-26 21:03:25.349000
001	sorry	p3	2022-08-26 21:03:25.723000
001	i'm late	p3	2022-08-26 21:03:26.129000
002	good morning bois	p1	2023-01-13 18:37:16.697000
002	time for some ranked games eh?	p1	2023-01-13 18:37:17.528000
002	i've been waiting	p2	2023-01-13 18:37:18.234000
002	let's go	p2	2023-01-13 18:37:18.234000
002	hey guys!	p3	2023-01-13 18:37:25.349000
002	sorry	p3	2023-01-13 18:37:25.723000
002	i'm late	p3	2023-01-13 18:37:26.129000

Desired Output:

id	text
001	p1: hi how are you doing? p2: i'm doing fine p3: hey guys! sorry i'm late
002	p1: good morning bois time for some ranked games eh? p2: i've been waiting let's go! p3: hey guys! sorry i'm late

What is a good, efficient way to get the desired output using SQL? OR.. How do I create an intermediate table with column "C" below?

I've been able to achieve this using Python + Pandas , but I'd like to do this operation using SQL. So this is what I've done using Python + Pandas:

First I create an intermediate table with column 'C' that groups each consecutive text by persons as a unique 'C' group. Then I do a group by the id, text and persons columns and aggregate on the text to achieve the desired results above.

Intermediate Table:

id	text	persons	C	timestamp
001	hi	p1	1	2022-08-26 21:03:16.697000
001	how are you doing?	p1	1	2022-08-26 21:03:17.528000
001	im doing fine	p2	2	2022-08-26 21:03:18.234000
001	hey guys!	p3	3	2022-08-26 21:03:25.349000
001	sorry	p3	3	2022-08-26 21:03:25.723000
001	i'm late	p3	3	2022-08-26 21:03:26.129000
002	good morning bois	p1	1	2023-01-13 18:37:16.697000
002	time for some ranked games eh?	p1	1	2023-01-13 18:37:17.528000
002	i've been waiting	p2	2	2023-01-13 18:37:18.234000
002	let's go	p2	2	2023-01-13 18:37:18.234000
002	hey guys!	p3	3	2023-01-13 18:37:25.349000
002	sorry	p3	3	2023-01-13 18:37:25.723000
002	i'm late	p3	3	2023-01-13 18:37:26.129000

Here's the command I used in Python + Pandas to get the desired output:

df['C'] = (df.persons.ne(df.persons.shift())).astype(int).cumsum()
df.groupby(['C','persons','id'], as_index=False)['text'].agg(' '.join)

I attempted this using SQL as well, but couldn't get the C column to work:

SELECT text, 
       persons, 
       row_number() over (partition by id, persons order by timestamp) as C
FROM table
ORDER BY timestamp

Solution

You could apply STRING_AGG twice:

first to aggregate on persons, and add "p:"
then to aggregate all your partial texts

In order to ensure the ordering of your messages, you need to carry timestamp in the aggregate operations.

WITH cte AS (
    SELECT id,
           MIN(timestamp) AS timestamp,
           CONCAT(persons, ': ', STRING_AGG(text, ' ' ORDER BY timestamp), ' ') AS text,
    FROM tab
    GROUP BY id, 
             persons
)
SELECT id, 
       GROUP_CONCAT(text, ' ' ORDER BY timestamp) AS text
FROM cte
GROUP BY id