Sort of a complex question, but I'm trying to get a sense of a count "tickets" that were opened/closed per month and their resolution time. I've done the dirty work (lots of window functions) of getting it in a detailed format, but now I need to "roll it up" and get a count.
Here is the example scenario:
ticketID | opened_at | closed_at | category |
---|---|---|---|
12 | 2023-01-15 | 2023-02-02 | sports |
13 | 2023-01-18 | 2023-01-22 | furniture |
15 | 2023-02-05 | null | colors |
22 | 2023-02-15 | 2023-03-10 | furniture |
30 | 2023-03-11 | 2023-04-20 | colors |
35 | 2023-04-05 | null | sports |
We have a queue of tickets where tickets come in each month and are either closed or remain open. The count of tickets at a start_queue point is the number of tickets open in the queue at the beginning of the month, and the new tickets are all the tickets that arrive in the queue that month. And, of course, the ending queue, is the remaining tickets that are open at the end of the month.
My desired output is as following (I left out the categories):
month | starting_queue | new_tickets | tickets_closed | ending_queue |
---|---|---|---|---|
Jan | 0 | 2 | 1 | 1 |
Feb | 1 | 2 | 1 | 2 |
Mar | 2 | 1 | 1 | 3 |
Apr | 3 | 1 | 1 | 2 |
I'm genuinely puzzled to where to begin. A data analyst at my company mentioned that maybe I do some version of SUM(CASE WHEN opened_at...) but my brain basically breaks after that part. Any pointers will be helpful!
I tried to simply aggregate based on the month, but no luck
A little bit verbose but I hope below give you some direction.
-- sample data
WITH tickets AS (
SELECT '12' ticketID, DATE '2023-01-15' opened_at, DATE '2023-02-02' closed_at, 'sports' category UNION ALL
SELECT '13' ticketID, '2023-01-18' opened_at, '2023-01-22' closed_at, 'furniture' category UNION ALL
SELECT '15' ticketID, '2023-02-05' opened_at, null closed_at, 'colors' category UNION ALL
SELECT '22' ticketID, '2023-02-15' opened_at, '2023-03-10' closed_at, 'furniture' category UNION ALL
SELECT '30' ticketID, '2023-03-11' opened_at, '2023-04-20' closed_at, 'colors' category UNION ALL
SELECT '35' ticketID, '2023-04-05' opened_at, null closed_at, 'sports' category
),
-- actual query starts here
tickets_preprocessed AS (
SELECT *,
DATE_TRUNC(opened_at, MONTH) opened_month,
DATE_TRUNC(closed_at, MONTH) closed_month,
DATE_TRUNC(GREATEST(MAX(opened_at) OVER (), MAX(closed_at) OVER ()), MONTH) last_month,
FROM tickets
),
pre_aggregates AS (
SELECT *,
opened_at < month AND (closed_at > month OR closed_at IS NULL) AS starting,
closed_at IS NULL OR closed_at > LAST_DAY(month) AS ending,
FROM tickets_preprocessed,
UNNEST(GENERATE_DATE_ARRAY(opened_month, IFNULL(closed_month, last_month), INTERVAL 1 MONTH)) month
)
SELECT month,
COUNT(DISTINCT IF(starting, ticketID, NULL)) starting_queue,
COUNT(DISTINCT IF(opened_month = month, ticketID, NULL)) new_tickets,
COUNT(DISTINCT IF(closed_month = month, ticketID, NULL)) tickets_closed,
COUNT(DISTINCT IF(ending, ticketID, NULL)) ending_queue,
FROM pre_aggregates GROUP BY month;
-- query result
+------------+----------------+-------------+----------------+--------------+
| month | starting_queue | new_tickets | tickets_closed | ending_queue |
+------------+----------------+-------------+----------------+--------------+
| 2023-01-01 | 0 | 2 | 1 | 1 |
| 2023-02-01 | 1 | 2 | 1 | 2 |
| 2023-03-01 | 2 | 1 | 1 | 2 |
| 2023-04-01 | 2 | 1 | 1 | 2 |
+------------+----------------+-------------+----------------+--------------+