Search code examples
sqlgoogle-bigquerycasewindow-functions

SUM with CASE Statements


Sort of a complex question, but I'm trying to get a sense of a count "tickets" that were opened/closed per month and their resolution time. I've done the dirty work (lots of window functions) of getting it in a detailed format, but now I need to "roll it up" and get a count.

Here is the example scenario:

ticketID opened_at closed_at category
12 2023-01-15 2023-02-02 sports
13 2023-01-18 2023-01-22 furniture
15 2023-02-05 null colors
22 2023-02-15 2023-03-10 furniture
30 2023-03-11 2023-04-20 colors
35 2023-04-05 null sports

We have a queue of tickets where tickets come in each month and are either closed or remain open. The count of tickets at a start_queue point is the number of tickets open in the queue at the beginning of the month, and the new tickets are all the tickets that arrive in the queue that month. And, of course, the ending queue, is the remaining tickets that are open at the end of the month.

My desired output is as following (I left out the categories):

month starting_queue new_tickets tickets_closed ending_queue
Jan 0 2 1 1
Feb 1 2 1 2
Mar 2 1 1 3
Apr 3 1 1 2

I'm genuinely puzzled to where to begin. A data analyst at my company mentioned that maybe I do some version of SUM(CASE WHEN opened_at...) but my brain basically breaks after that part. Any pointers will be helpful!

I tried to simply aggregate based on the month, but no luck


Solution

  • A little bit verbose but I hope below give you some direction.

    -- sample data
    WITH tickets AS (
      SELECT '12' ticketID, DATE '2023-01-15' opened_at, DATE '2023-02-02' closed_at, 'sports' category UNION ALL
      SELECT '13' ticketID, '2023-01-18' opened_at, '2023-01-22' closed_at, 'furniture' category UNION ALL
      SELECT '15' ticketID, '2023-02-05' opened_at, null closed_at, 'colors' category UNION ALL
      SELECT '22' ticketID, '2023-02-15' opened_at, '2023-03-10' closed_at, 'furniture' category UNION ALL
      SELECT '30' ticketID, '2023-03-11' opened_at, '2023-04-20' closed_at, 'colors' category UNION ALL
      SELECT '35' ticketID, '2023-04-05' opened_at, null closed_at, 'sports' category
    ),
    -- actual query starts here
    tickets_preprocessed AS (
      SELECT *,
             DATE_TRUNC(opened_at, MONTH) opened_month,
             DATE_TRUNC(closed_at, MONTH) closed_month,
             DATE_TRUNC(GREATEST(MAX(opened_at) OVER (), MAX(closed_at) OVER ()), MONTH) last_month,
        FROM tickets
    ),
    pre_aggregates AS (
      SELECT *,
             opened_at < month AND (closed_at > month OR closed_at IS NULL) AS starting,
             closed_at IS NULL OR closed_at > LAST_DAY(month) AS ending,
        FROM tickets_preprocessed, 
      UNNEST(GENERATE_DATE_ARRAY(opened_month, IFNULL(closed_month, last_month), INTERVAL 1 MONTH)) month
    )
    SELECT month,
           COUNT(DISTINCT IF(starting, ticketID, NULL)) starting_queue,
           COUNT(DISTINCT IF(opened_month = month, ticketID, NULL)) new_tickets,
           COUNT(DISTINCT IF(closed_month = month, ticketID, NULL)) tickets_closed,
           COUNT(DISTINCT IF(ending, ticketID, NULL)) ending_queue,
      FROM pre_aggregates GROUP BY month;
    
    -- query result
    +------------+----------------+-------------+----------------+--------------+
    |   month    | starting_queue | new_tickets | tickets_closed | ending_queue |
    +------------+----------------+-------------+----------------+--------------+
    | 2023-01-01 |              0 |           2 |              1 |            1 |
    | 2023-02-01 |              1 |           2 |              1 |            2 |
    | 2023-03-01 |              2 |           1 |              1 |            2 |
    | 2023-04-01 |              2 |           1 |              1 |            2 |
    +------------+----------------+-------------+----------------+--------------+