I have been looking and testing around for a while, but I can't seem to get the desired result. I have a dataset that looks like this (messageID = unique values):
Each row represents activity (text message, click activity) in a chat environment. You can either chat with a bot, interact with the system or interact with a human.
My goal is to have the amount of text messages within a conversation, meaning: only the activity in which 2 humans are talking. The conversation starts after Client + Click, and the Employee sends the first message (employee + default).
The result should look like this:
The counting should start at the first Employee value within a conversation, and should end before the next non-human activity, or just the last message of that conversation.
An observation: Combinations of client + message aren't solely available for the human interactions, but also the client's bot interactions get the same combination.
Can someone help me with the correct code for in BigQuery?
Thanks! Very much appreciated.
I tried to count (*) all rows between the timestamp of Event = "Click" and Event = "Stop", for each conversation ID partitioned. But I can't run the query because the BETWEEN statement doe not allow another SELECT statement within it.
I read partitioning and window documentation, but nothing seems to apply to my case.
You may use SUM window function to mark with 1 the messages after the 'Click' event and mark with -1 the messages after the 'Stop' event. Total number of messages = messages after 'Click' - messages after 'Stop'.
WITH sample_table AS (
SELECT 5 as conversationID, 'Bot' as Actor, 'Default' as Event, DATETIME '2020-02-19 10:01:20' as EventTimestamp UNION ALL
SELECT 5 as conversationID, 'Unknown' as Actor, 'Message' as Event, DATETIME '2020-02-19 10:02:20' as EventTimestamp UNION ALL
SELECT 5 as conversationID, 'Client' as Actor, 'Message' as Event, DATETIME '2020-02-19 10:03:20' as EventTimestamp UNION ALL
SELECT 5 as conversationID, 'Client' as Actor, 'Click' as Event, DATETIME '2020-02-19 10:03:25' as EventTimestamp UNION ALL
SELECT 5 as conversationID, 'Employee' as Actor, 'Default' as Event, DATETIME '2020-02-19 10:04:20' as EventTimestamp UNION ALL
SELECT 5 as conversationID, 'Employee' as Actor, 'Default' as Event, DATETIME '2020-02-19 10:04:22' as EventTimestamp UNION ALL
SELECT 5 as conversationID, 'Client' as Actor, 'Message' as Event, DATETIME '2020-02-19 10:05:20' as EventTimestamp UNION ALL
SELECT 5 as conversationID, 'Employee' as Actor, 'Default' as Event, DATETIME '2020-02-19 10:06:20' as EventTimestamp UNION ALL
SELECT 5 as conversationID, 'Client' as Actor, 'Message' as Event, DATETIME '2020-02-19 10:07:20' as EventTimestamp UNION ALL
SELECT 5 as conversationID, 'Client' as Actor, 'Message' as Event, DATETIME '2020-02-19 10:08:20' as EventTimestamp UNION ALL
SELECT 5 as conversationID, 'Employee' as Actor, 'Default' as Event, DATETIME '2020-02-19 10:09:20' as EventTimestamp UNION ALL
SELECT 5 as conversationID, 'Client' as Actor, 'Message' as Event, DATETIME '2020-02-19 10:09:29' as EventTimestamp UNION ALL
SELECT 5 as conversationID, 'Client' as Actor, 'Stop' as Event, DATETIME '2020-02-19 10:09:30' as EventTimestamp UNION ALL
SELECT 6 as conversationID, 'Bot' as Actor, 'Default' as Event, DATETIME '2020-02-19 21:10:05' as EventTimestamp UNION ALL
SELECT 6 as conversationID, 'Client' as Actor, 'Click' as Event, DATETIME '2020-02-19 21:10:06' as EventTimestamp UNION ALL
SELECT 6 as conversationID, 'Employee' as Actor, 'Default' as Event, DATETIME '2020-02-19 21:10:07' as EventTimestamp UNION ALL
SELECT 6 as conversationID, 'Employee' as Actor, 'Default' as Event, DATETIME '2020-02-19 21:10:08' as EventTimestamp UNION ALL
SELECT 6 as conversationID, 'Client' as Actor, 'Message' as Event, DATETIME '2020-02-19 21:10:40' as EventTimestamp UNION ALL
SELECT 6 as conversationID, 'Employee' as Actor, 'Default' as Event, DATETIME '2020-02-19 21:10:50' as EventTimestamp UNION ALL
SELECT 6 as conversationID, 'Client' as Actor, 'Message' as Event, DATETIME '2020-02-19 21:12:05' as EventTimestamp UNION ALL
SELECT 6 as conversationID, 'Client' as Actor, 'Message' as Event, DATETIME '2020-02-19 21:13:05' as EventTimestamp UNION ALL
SELECT 6 as conversationID, 'Employee' as Actor, 'Default' as Event, DATETIME '2020-02-19 21:14:05' as EventTimestamp UNION ALL
SELECT 7 as conversationID, 'Client' as Actor, 'Click' as Event, DATETIME '2020-03-05 08:20:09' as EventTimestamp UNION ALL
SELECT 7 as conversationID, 'Bot' as Actor, 'Default' as Event, DATETIME '2020-03-05 08:20:09' as EventTimestamp
)
SELECT conversationID, SUM(click_event + messages_after_click + messages_after_stop) as no_of_rows_with_human_conversation
FROM (
SELECT *,
CASE WHEN Event = 'Click' THEN -1 ELSE 0 END as click_event,
SUM(CASE WHEN Event = 'Click' THEN 1 ELSE 0 END) OVER (PARTITION BY conversationID ORDER BY EventTimestamp) AS messages_after_click,
SUM(CASE WHEN Event = 'Stop' THEN -1 ELSE 0 END) OVER (PARTITION BY conversationID ORDER BY EventTimestamp) AS messages_after_stop
FROM sample_table
WHERE Actor IN ('Client', 'Employee')
)
GROUP BY conversationID